miwipe / ngsLCA

GNU General Public License v3.0
9 stars 5 forks source link

problem filtering with edit distance #33

Open tothuhien opened 1 month ago

tothuhien commented 1 month ago

Hi, I'm trying to run ngsLCA to filtering out alignments with exactly 2 edit distance. So I used -editdistmin 2 -editdistmax 2. But in the result file, I see: For example 1 read that has 10 alignments with edit distances (tag NM:) are 7,7,11,15,15,7,7,9,8,14 - so all are > 2 - is still reported in the result file. When I subset the bam file to only those alignments, it does not report this read. It does not happen neither when I used a range of edit distance, for example -editdistmin 0 -editdistmax 5. Could you help to have a look? Thanks, Hien

miwipe commented 1 month ago

Hi Hien,

When you say result file you mean the lca txt file, correct? Can you send the full command you are using, please?

Best, Mikkel

Dr. Mikkel Winther Pedersen Tenure track assistant professor DNRF - Centre for Ancient Environmental Genomics The Globe Institute University of Copenhagen Oester Voldgade 5-7 1350 CPH C. Denmark

Phone: +45 2927 5342 Mail: @.***

On 9 Aug 2024, at 11.43, Thu-Hien To @.***> wrote:

Hi, I'm trying to run ngsLCA to filtering out alignments with exactly 2 edit distance. So I used -editdistmin 2 -editdistmax 2. But in the result file, I see: For example 1 read that has 10 alignments with edit distances (tag NM:) are 7,7,11,15,15,7,7,9,8,14 - so all are > 2 - is still reported in the result file. When I subset the bam file to only those alignments, it does not report this read. It does not happen neither when I used a range of edit distance, for example -editdistmin 0 -editdistmax 5. Could you help to have a look? Thanks, Hien

— Reply to this email directly, view it on GitHubhttps://github.com/miwipe/ngsLCA/issues/33, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACXBWEGXCAJDB74ZCBGWRHTZQSFMPAVCNFSM6AAAAABMIECLDWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2TONJTGM4DQNQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

tothuhien commented 1 month ago

Hi, Yes it meant the lca file. Here is the command lines that I used:

conda activate ngsLCA
ngsLCA -editdistmin $min -editdistmax $max \
 -names taxonomy/names.dmp \
 -nodes taxonomy/nodes.dmp \
 -acc2tax nucl_gb.accession2taxid_40sp.tsv \
 -bam ${dir}/${sample}.sorted.bam \
 -outnames ${outdir}/${sample}_${min}_${max}

Thanks!

miwipe commented 1 month ago

Hmm, the command looks ok. Hien, would you be willing to share with me the bam file you have? Or alternatively make a screendump of the alignments of the specific reads and the lca output?

You can also send it to @.**@.>

Thanks!

Dr. Mikkel Winther Pedersen Tenure track assistant professor DNRF - Centre for Ancient Environmental Genomics The Globe Institute University of Copenhagen Oester Voldgade 5-7 1350 CPH C. Denmark

Phone: +45 2927 5342 Mail: @.***

On 9 Aug 2024, at 12.27, Thu-Hien To @.***> wrote:

Hi, Yes it meant the lca file. Here is the command lines that I used:

conda activate ngsLCA ngsLCA -editdistmin $min -editdistmax $max \ -names taxonomy/names.dmp \ -nodes taxonomy/nodes.dmp \ -acc2tax nucl_gb.accession2taxid40sp.tsv \ -bam ${dir}/${sample}.sorted.bam \ -outnames ${outdir}/${sample}${min}_${max}

Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/miwipe/ngsLCA/issues/33#issuecomment-2277640736, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACXBWEFB2LPP7MRZCI3KBHLZQSKPTAVCNFSM6AAAAABMIECLDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZXGY2DANZTGY. You are receiving this because you commented.Message ID: @.***>

tothuhien commented 1 month ago

I've sent it to you via email. Thanks.