jordiabante / biodive

A reference-free statistical approach to diversity-generating & mobile genetic element discovery
Other
5 stars 0 forks source link

Original fasta sequence ID information missing in output #2

Open mhyleung opened 1 year ago

mhyleung commented 1 year ago

Dear developers

I would like to know whether it is possible to have the fasta ID where each of the output anchor sequences are detected somewhere in the output files? For now, my output hits are called seq1/seq2, without information as to which sequences from the input file these anchors are from. I am trying to locate back where these sequences are on my input sequences, without having to run a blast between my output anchor seqs and my input sequences.

Thank you very much

Regards

Marc

jordiabante commented 1 year ago

Hi Marc,

Thank you for your suggestion. We agree this would be helpful, but we have to see if it's worth doing because of memory issues since this would require keeping in memory even more data.

A solution is what you're proposing, but it is possible to use a simple grep command to parse the anchors of interest (and targets) in the original FASTQ files and retrieve the original sequences. This is what we did in our GB paper and we didn't find this to be a big issue.

Thanks, jordi

------- Original Message ------- El dilluns, 6 de novembre 2023 a les 12:28 PM, Marcus H Y Leung @.***> va escriure:

Dear developers

I would like to know whether it is possible to have the fasta ID where each of the output anchor sequences are detected somewhere in the output files? For now, my output hits are called seq1/seq2, without information as to which sequences from the input file these anchors are from. I am trying to locate back where these sequences are on my input sequences, without having to run a blast between my output anchor seqs and my input sequences.

Thank you very much

Regards

Marc

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>