Closed HuangLoong closed 3 months ago
This likely means some of your sequences have exactly identical sequence id. You can check the duplication using e.g. seqkit rmdup
.
This likely means some of your sequences have exactly identical sequence id. You can check the duplication using e.g.
seqkit rmdup
.
We renamed the sequence after downloading the raw data from NCBI, and no duplicate sequence IDs were detected using seqkit rmdup.
Hi,
do you mind sharing a minimal reproducible example? Does your _1 and _2 files have identical sequence id?
cat _1.fa _2.fa > seq.fa
seqkit rmdup seq.fa -D dup.fa
Hi,
do you mind sharing a minimal reproducible example? Does your _1 and _2 files have identical sequence id?
cat _1.fa _2.fa > seq.fa seqkit rmdup seq.fa -D dup.fa
Thank you. Based on the command you provided, I did detect some duplicated sequences. I then extracted the previously retrieved duplicate IDs from the paired-end sequencing file. Duplicate IDs were found only in one of the pairwise sequence files. I wonder if this error will affect the final result output.
Only the first detected sequence will be counted. This will lead to an underestimation of either ARG/16S/genome copies.
I generated a warning message while running ARGs OAP stage_one on the server. The metagenomic data is sourced from NCBI and has gone through the kneaddata process. However, when I used my own sequencing data, no warning message was generated. What is the reason and will it affect subsequent analysis? Thanks very much for your answer! WARNING: Duplicated sequences in sequence extraction. WARNING: Duplicated sequences in 16S copy number calculation. WARNING: Duplicated sequences in cell number calculation.