unique379r / strspy

STRspy: a novel alignment and quantification-based state-of-the-art method, short tandem repeat (STR) detection calling tool designed specifically for long-read sequencing reads such as from Oxford nanopore technology (ONT) and PacBio.
MIT License
15 stars 5 forks source link

General questions and Error: inconsistent naming convention #8

Open HLHsieh opened 1 month ago

HLHsieh commented 1 month ago

Hi Rupesh,

I have several questions about the usage:

  1. I am wondering whether there are any limitations on its detection length.
  2. Could I use strspy to detect the repeat sequence CCCCGCGCCCGGCCTTCCCCGGGGTCCCTGCGGCCCCGACTGTGCGCC profile?
  3. How do I use strspy to quantify the number of contiguous repeat units? I did not see a direct result from the output, or I might have missed this information.

Besides, when running strspy, I got this error:

***** WARNING: File /scratch/kinfai_root/kinfai0/hsinlun/tri_test/align/C9ORF72_1_9R_NanoSim_2x.sorted.bam has inconsistent naming convention for record:
chr1    14337   20040   C9ORF72-1_14451_aligned_12683_F_19_5702_13  0   +

***** WARNING: File /scratch/kinfai_root/kinfai0/hsinlun/tri_test/align/C9ORF72_1_9R_NanoSim_2x.sorted.bam has inconsistent naming convention for record:
chr1    14337   20040   C9ORF72-1_14451_aligned_12683_F_19_5702_13  0   +

I would appreciate any solutions to this.

Best, Hsin

unique379r commented 1 month ago

Hi

  1. There is no limitation of length of the STRs. We have seen STR in chrY more than 100.
  2. You can, but STRspy rely on the database fasta (flanking_seq -----STR repeats------flanking)and the bed. If you are able to prepare from your reference location (repeat ordinates).
  3. if you provide the db fasta, STRspy will output a freq files of all STRs that present in your sample. Please have a look the test db fasta and the results.
  4. It would be useful to track the warning or error if you show me how did you run the STRspy step by step. But i guess thats the error comes from SNV calling tool i.e. xAtlas. It happens as your bam is the matching the reference. xAtlas, require the str bam not the genomic bam.

Hope this helps !

Rupesh Kesharwani