renzilin / NASTRA

Innovative Short Tandem Repeat Analysis through Cluster-Based Structure-Aware Algorithm in Nanopore Sequencing Data
GNU General Public License v3.0
4 stars 0 forks source link

Question about input files and output #1

Closed HLHsieh closed 3 months ago

HLHsieh commented 4 months ago

Hi there,

I'm really interested in this tool! I'd like to know if it's possible to use it to determine the number of repeats on each allele.

Additionally, I'm curious about how to obtain the repeat_structure.pat, panel_forenseq.csv, and threshold.cfg files. Could you please provide some guidance on obtaining these files and explain the meaning of each column in them, as well as in the output files like barcode01.txt and barcode01.log?

Regarding the output barcode01.txt from executing the sample code, it seems that for first several loci, no sequence is detected in the sample. Am I interpreting this correctly?

barcode,locus,seq,genotype,sn,sn_ratio
barcode01,DYS19,<None>,<None>,0,<LowCov>
barcode01,DYS390,<None>,<None>,0,<LowCov>
barcode01,DYS391,<None>,<None>,0,<LowCov>
barcode01,DYS392,<None>,<None>,1,<LowCov>
barcode01,DYS437,<None>,<None>,0,<LowCov>
barcode01,DYS438,<None>,<None>,0,<LowCov>
barcode01,DYS439,<None>,<None>,0,<LowCov>
barcode01,DYS448,<None>,<None>,2,<LowCov>
barcode01,DYS460,<None>,<None>,0,<LowCov>
barcode01,DYS481,<None>,<None>,0,<LowCov>
barcode01,DYS505,<None>,<None>,0,<LowCov>
barcode01,DYS522,<None>,<None>,0,<LowCov>
barcode01,DYS533,<None>,<None>,0,<LowCov>
barcode01,DYS549,<None>,<None>,0,<LowCov>
barcode01,DYS570,<None>,<None>,0,<LowCov>

barcode01,DXS8378,AAT [ATAG]9 TGA,9.0,870,1.0
barcode01,DXS8378,AAT [ATAG]11 TGA,11.0,570,0.65517
barcode01,DXS8378,AAT [ATAG]10 TGA,10.0,47,0.05402
barcode01,DXS8378,AAT [ATAG]8 TGA,8.0,24,0.02759
barcode01,HPRTB,TCT [ATCT]12 AAA,12.0,269,1.0
barcode01,HPRTB,TCT [ATCT]13 AAA,13.0,209,0.77695
barcode01,HPRTB,TCT [ATCT]11 AAA,11.0,28,0.10409
barcode01,HPRTB,TCT [ATCT]14 AAA,14.0,7,0.02602
barcode01,HPRTB,TCT [ATCT]10 AAA,10.0,6,0.0223

Thank you, Hsin

renzilin commented 4 months ago

Sorry for the late reply. The FASTQ data is uploading to a cloud. When the progress is done, I’ll sent you the link. Best, Zilin Ren

On May 4, 2024, at 7:38 AM, HLHsieh @.***> wrote:



Hi there,

I am interested in this amazing tool. Could you please provide test data? Additionally, I'm curious if I can use this tool to determine the number of repeats on each allele.

Thank you, Hsin

— Reply to this email directly, view it on GitHubhttps://github.com/renzilin/NASTRA/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECXSIGQC3B3K4S4H5FXTDTZAQNX3AVCNFSM6AAAAABHGJQSJKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TQNJXG4YTKMQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

renzilin commented 3 months ago

Hi there,

I'm really interested in this tool! I'd like to know if it's possible to use it to determine the number of repeats on each allele.

Additionally, I'm curious about how to obtain the repeat_structure.pat, panel_forenseq.csv, and threshold.cfg files. Could you please provide some guidance on obtaining these files and explain the meaning of each column in them, as well as in the output files like barcode01.txt and barcode01.log?

Regarding the output barcode01.txt from executing the sample code, it seems that for first several loci, no sequence is detected in the sample. Am I interpreting this correctly?

barcode,locus,seq,genotype,sn,sn_ratio
barcode01,DYS19,<None>,<None>,0,<LowCov>
barcode01,DYS390,<None>,<None>,0,<LowCov>
barcode01,DYS391,<None>,<None>,0,<LowCov>
barcode01,DYS392,<None>,<None>,1,<LowCov>
barcode01,DYS437,<None>,<None>,0,<LowCov>
barcode01,DYS438,<None>,<None>,0,<LowCov>
barcode01,DYS439,<None>,<None>,0,<LowCov>
barcode01,DYS448,<None>,<None>,2,<LowCov>
barcode01,DYS460,<None>,<None>,0,<LowCov>
barcode01,DYS481,<None>,<None>,0,<LowCov>
barcode01,DYS505,<None>,<None>,0,<LowCov>
barcode01,DYS522,<None>,<None>,0,<LowCov>
barcode01,DYS533,<None>,<None>,0,<LowCov>
barcode01,DYS549,<None>,<None>,0,<LowCov>
barcode01,DYS570,<None>,<None>,0,<LowCov>

barcode01,DXS8378,AAT [ATAG]9 TGA,9.0,870,1.0
barcode01,DXS8378,AAT [ATAG]11 TGA,11.0,570,0.65517
barcode01,DXS8378,AAT [ATAG]10 TGA,10.0,47,0.05402
barcode01,DXS8378,AAT [ATAG]8 TGA,8.0,24,0.02759
barcode01,HPRTB,TCT [ATCT]12 AAA,12.0,269,1.0
barcode01,HPRTB,TCT [ATCT]13 AAA,13.0,209,0.77695
barcode01,HPRTB,TCT [ATCT]11 AAA,11.0,28,0.10409
barcode01,HPRTB,TCT [ATCT]14 AAA,14.0,7,0.02602
barcode01,HPRTB,TCT [ATCT]10 AAA,10.0,6,0.0223

Thank you, Hsin

Hi Hsin, The files repeat_structure.pat and panel_forenseq.csv are organized manually, which can be found from the STRBase (https://strbase.nist.gov/). As for the cutoff file, we conducted a experiment for parameter inference, which is described in the manuscript. You can get access to the preprint version (https://www.biorxiv.org/content/10.1101/2023.11.04.565630v1) .

Log file outputs all genotyped alleles, while the txt file shows the final genotype of each locus. If the result shows , which means the low coverage of this locus. As shown in your result, it may indicates the individual is a female.

If you have any further questions, please contact me directly!

Best, Zilin