Closed marieleoz closed 2 years ago
Dear Marie,
I can definitly answer the last question. The output are both files (centrifuge_out.tsv and readIDTaxID.txt. Hereby is the last one the "real" output containing 2 columns (readID and taxID). WIth this file we generally create an otu-table in R. If you multiplexed the samples before sequencing everything with ONT you receive a sequencing_summary.txt file after basecalling with guppy. From this file you can extract the barcode and readID column. After merging both tables (sequencing_summary.txt and readIDtaxID.txt) in R you are able to assign each read to its barcode/ sample.
Regarding the other issues we will come back to these pretty soon.
Best,
Christoph
Dear Marie,
can you please provide your Input file? I would like to reproduce your problems.
If you want/need more files you could use --verbose
but this is more for debugging purposes.
Best,
Tim
Thanks to you both!
Tim, here's a link to the single fastq file I used for TEST4: https://we.tl/t-mGZzKo4k4k
In the meantime I went on with the full series of fastq files I got for this sample, and got some more errors at the end + the readIDTaxID was not produced at all:
Traceback (most recent call last):
File "/src/mmp2_processing.py", line 57, in
python /src/mmp2_processing.py /files/ 1000:50 0 true cp: cannot stat '/files//readIDTaxID*': No such file or directory
Please let me know if I should create a separate issue and/or send you more data for this one.
Apologies for the trouble :)
Best, Marie
Dear Tim,
Any luck with your investigations? I ran the analysis again on the single fastq file but in verbose mode. See TEST4verb.log attached (if needed I can also provide with some more files that were produced during the run) TEST4verb.log
I don't know if you noticed earlier but it looks like minimap fails processing TaxID 9 (the first one to be processed). It gets stuck there a while, then moves on to TaxID 545 and indicates: /src/metapont: line 163: 69 Killed minimap2 -2 -c --secondary=no -t "$threads" -x map-ont "$(dirname $reference_seqs)/fastaTaxID/$taxID.gz" "$tmp_directory/minimap2output/temp.fq" > "$tmp_directory/minimap2output/$taxID.paf"
Couldn't this be why we then get errors such as "Length of header or names does not match length of data" or "Expected 23 fields in line 9195, saw 24"?
Looking forward to hearing from you.
Best, Marie
Hey Marie, I am back from vacation now. I will see if I can have a look at your problems sometime this week. Cheers, Tim
Ok great! Thank you Tim. Looking froward to hearing back from you. Cheers, Marie
Le Lundi, Avril 25, 2022 15:33 CEST, tim488 @.***> a écrit:
Hey Marie, I am back from vacation now. I will see if I can have a look at your problems sometime this week. Cheers, Tim
-- Reply to this email directly or view it on GitHub: https://github.com/microbiome-gastro-UMG/MeTaPONT/issues/3#issuecomment-1108581210 You are receiving this because you authored the thread.
Message ID: @.***>
Hey Marie,
the 2. Issue from your first post on top of this page is not an issue.
/src/mmp2_processing.py:57: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
can be ignored.
Why does this Warning show?
Minimap2 returns a table with varying amount of columns (i.e. one row has 15 columns, next row has 20 columns) this is because it returns a PAF formatted File and on each row appends a SAM formatted line.
If you still have the verbose output you should be able to have a look inside an AS
Tag, which is the Alignment Score, which is in the first 23 columns, the rest is not interesting for us, as it just gives the coordinates of the Minimap2 hits (as far as I remember).
There is a List about the different columns of the minimap2 output here https://lh3.github.io/minimap2/minimap2.html (scroll down)
As for the one you reported on March 17th pandas.errors.ParserError: Error tokenizing data. C error: Expected 23 fields in line 9195, saw 24
this is weird and exactly what I tried to prevent (I think). Can you by chance give me line 9195?
I have not seen the 69 killed Error before. I found here that it could be an error message telling 'service unavailable'. Can you try if this happens if you submit a subset of your data? Maybe the dataset was to large.
I am sorry that you had to wait some time, but I do this in my free time/ as a hobby so please be a bit patient :-)
Cheers, Tim
EDIT: I just saw the people at Minimap were busy and released a few new versions (and a nature paper ^^). I fixed the version in Metapont to the older version (2.17) that we used in our paper. Can you try if that resolved the issue?
EDIT 2: for the Abundance Issue: this seems to be a Problem with Centrifuge. I don't think you have to worry as we don't use it anyway. Have a look here.
Hi Tim,
Thanks a lot for your feedback. You do have interesting hobbies :)
Based on your EDIT 1, I tried to re-install MeTaPONT so that I can try with minimap2 v2.17. Not sure I did it right, but the way I did (moving all my previous MeTaPONT stuff in a folder and re-doing git clone + docker build), the "real" issues are still here :/
We can compare the results from the 3 runs I ran in verbose mode:
1) minimap2 always fails processing the first TaxID , whether I use a single or all fastq files from my sample: the process is stuck there for a while and my computer doesn't like this at all. Then I get the Killed error (note that it's not always "69 Killed", TEST8verb and TEST8new respectively got 76 and 77), and it goes smoothly on through all the other TaxIDs.
2) the pandas error when parsing the minimap output only arose when investigating multiple fastq files, though not at the same line for TEST8verb (line 9195) and TEST8new (line 6434). I wish I could extract these lines for you but I guess they would have been in the mmp2_out.tsv file, which is not produced by these two runs. I just have the .paf files, and it does look like they all have 23 columns only.
Please let me know what else you think I could try / send. We are going to be in a hurry pretty soon because this is for a student project, so in the meantime I'll try to use Centrifuge and/or minimap2 alone.
Cheers, Marie
Dear Marie,
sorry for the delay, but everything supposed to be fine right now. We tested the updated version from the scratch and no error or issue occurs anymore. Hope the pipeline is working for you as well. If you have further issues please do not hesitate dropping a new issue or comment.
Best
Christoph
Dear Christoph,
It does work for me as well :) Thanks a lot!
Best, Marie
Le Dimanche, Mai 15, 2022 09:34 CEST, Christoph-Ammer @.***> a écrit:
Dear Marie,
sorry for the delay, but everything supposed to be fine right now. We tested the updated version from the scratch and no error or issue occurs anymore. Hope the pipeline is working for you as well. If you have further issues please do not hesitate dropping a new issue or comment.
Best
Christoph
-- Reply to this email directly or view it on GitHub: https://github.com/microbiome-gastro-UMG/MeTaPONT/issues/3#issuecomment-1126878331 You are receiving this because you authored the thread.
Message ID: @.***>
Dear Tim,
Thanks a lot for solving my previous issue! I successfully ran a first test to the end :) Please could you help me with 3 new issues/questions that emerged?
1) Is it expected that no abundance is calculated by Centrifuge? Here's what the log says: report file centrifuge_report.tsv Number of iterations in EM algorithm: 0 Probability diff. (P - P_prev) in the last iteration: 0 Calculating abundance: 00:00:00
(see TEST4.log attached) TEST4.log
Indeed the "abundance" column in centrifuge_report.tsv only has 0.00 values.
2) The issue arises after minimap has processed all the TaxIDs, I get a series of this: /src/mmp2_processing.py:57: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False. tmp_df = pd.read_csv(output_dir + '/minimap2output/' + file, index_col=False, sep='\t',
Not sure whether this is critical and how to fix it if needed?
3) Is it expected that the only output files (besides centrifuge_report.tsv, that I actually found in the database folder) are centrifuge_out.tsv and readIDTaxID.txt files?
Thanks a lot!
Best, Marie