hariszaf / pema

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes
26 stars 12 forks source link

Wrong sample names in OutputPerSample #69

Open JustinePa opened 2 weeks ago

JustinePa commented 2 weeks ago

Hi!

When checking "outputPerSample", for some of my samples the accession numbers are correct in the file names, but in the file itself it's another accession number. See an example below with files for ERR4914067. This ERR number does not appear inside these files, but ERR4914068 and ERR4914071 do. "profile_ERR4914067.csv" even has 3 ERR numbers including two ERR4914068.

profile_ERR4914067.csv Relative_Abundance_ERR4914067.csv Richness_ERR4914067.csv All_Cumulative_ERR4914067.csv

Additionally, ERR4614067 does not appear in the final table. Maybe it could be related to what I described above.

Thanks for your input!

hariszaf commented 2 weeks ago

Please attach your parameters.tsv file.

JustinePa commented 2 weeks ago

PemaParameters_v214_18S_2023_pr2.txt

hariszaf commented 2 weeks ago

This is a v.2.1.4. related issue. I am not sure whether it still applies in v.2.1.5; it is my belief it is now fixed at least for a number of gene-preprocess-clustering combinations. @savvas-paragkamian and @cpavloud have tested v.2.1.5 in some cases, could you share any insight ?
In my tests, the files are as they are supposed to.

JustinePa commented 2 weeks ago

Do you know what might be creating this?

I am still using v.2.1.4 as I am using LifeWatch services that only have this version for now. So is there anything specific I have to look out for?

hariszaf commented 2 weeks ago

I would have to check and this is a bad time for me to work on this.. One should have to check the postAssignment module. I will be able to do that from September. I suggest you discuss with the LW people we you decide to work with v.2.1.5 and fix everything once.