lovemun / SEXCMD

5 stars 5 forks source link

Error in the SEXCMD.R script #5

Open ValentinaBoP opened 4 years ago

ValentinaBoP commented 4 years ago

Hi Seongmun,

I was using your script on several avian libraries and I found a couple of problems with the code in SEXCMD.R that you might want to check out:

Lines 118-127:

if (sextype == "XY"){
  ct <- data.frame(Length=do.call(rbind, marker_length_list[startsWith(x = names(marker_seq_list), prefix = "chrX")]), Count=do.call(rbind, marker_seq_list[startsWith(x = names(marker_seq_list), prefix = "chrX")]))
  ct <- cbind(ct, do.call(rbind, marker_seq_list[startsWith(x = names(marker_seq_list), prefix = "chrY")]))
  colnames(ct) <- c("Length", "Xcount", "Ycount")
}
if (sextype == "ZW"){
  ct <- data.frame(Length=do.call(rbind, marker_length_list[startsWith(x = names(marker_seq_list), prefix = "chrZ")]), Count=do.call(rbind, marker_seq_list[startsWith(x = names(marker_seq_list), prefix = "chrW")]))
  ct <- cbind(ct, do.call(rbind, marker_seq_list[startsWith(x = names(marker_seq_list), prefix = "chrW")]))
  colnames(ct) <- c("Length", "Zcount", "Wcount")
}

In the ZW system there is an inconsistency with respect to the XY system chunk of code. I think this line: ct <- data.frame(Length=do.call(rbind, marker_length_list[startsWith(x = names(marker_seq_list), prefix = "chrZ")]), Count=do.call(rbind, marker_seq_list[startsWith(x = names(marker_seq_list), prefix = "chrW")])) should be modified in: ct <- data.frame(Length=do.call(rbind, marker_length_list[startsWith(x = names(marker_seq_list), prefix = "chrZ")]), Count=do.call(rbind, marker_seq_list[startsWith(x = names(marker_seq_list), prefix = "chrZ")])) In this way it mimics the code for the XY system.

In lines 118, 123, 131, 140 if (sextype = "XY") and if (sextype = "ZW") should be replaced with if (sextype == "XY") and if (sextype == "ZW")

Thank you again for your tool :)

Valentina

lovemun commented 4 years ago

Thank you for recommendation. Let me know what the errors were found, then I will help you to solve the problem. And can you show me your running command?

ValentinaBoP commented 4 years ago

Hi, I think the problem is now solved by the modifications made (see above).

More precisely, the if statement as it is written if (sextype = "XY") throws an error since it is expecting a comparison (TRUE/FALSE) and not a value assignment.

For the ct dataframe, there are no error complaints per se in that precise command because yes it works but it is not right. In the ZW system the W-specific values are assigned to both columns and the Z-specific are never assigned/saved. I found this out because I had several female and male libraries and for the females the script worked fine but never for the males. Indeed for females there are W-specific values different from 0 therefore when rs is calculated (always resulted in 1 since the two columns ZCount and WCount contain the same values) the script worked. For males that had only 0 for the W, 0s were assigned in both columns and rs resulted in NaN and the script could not determine the sex.

By modifying the command following the XY system, the script worked and the sexes were correctly determined for male and females.

ml bioinfo-tools samtools bwa
ml R_packages/3.6.0

SCRIPT=/Valentina/Code/SEXCMD
INDIR=/Valentina/Intermediate/GalGal/
OUTDIR=/Valentina/Data/GalGal/
FINAL=$INDIR/sex_marker.galGal4.filtered.final.github.fasta

cd $OUTDIR

for INPUT in $( ls *.fastq.gz )
do
 Rscript $SCRIPT/SEXCMD.R $FINAL 3 ZW $INPUT
done