snijderlab / stitch

Template-based assembly of proteomics short reads for de novo antibody sequencing and repertoire profiling
MIT License
22 stars 3 forks source link

Unrecognized Casanovo result.mztab file line start code #242

Closed Ln9052 closed 11 months ago

Ln9052 commented 11 months ago

Hi Schulte, currently, I've encountered an issue when using Stitch to read "result.mztab" file from Casanovo. The error message reads, for example, image However, I have verified the beginning lines of the "result.mztab" file and ensured that they start with valid codes. image What steps should I take to resolve this issue?Do you have any helpful suggestions? Thank you.

douweschulte commented 11 months ago

That is indeed looking quite weird. Could you send me the full file? Either as attachment here in the issue or to my email (d.schulte(AT)uu.nl)? That way I will be able to check what is wrong on my side.

Ln9052 commented 11 months ago

Thank you for your response. Could you please take a look at file "sample_preprocessed_spectra.mztab" and try to read the file according to the manual of Stitch, to identify the cause of the issue mentioned above? Thank you once more for your assistance.

-----原始邮件----- 发件人:"Douwe Schulte" @.> 发送时间:2023-11-27 18:45:42 (星期一) 收件人: snijderlab/stitch @.> 抄送: Ln9052 @.>, Author @.> 主题: Re: [snijderlab/stitch] Unrecognized Casanovo result.mztab file line start code (Issue #242)

That is indeed looking quite weird. Could you send me the full file? Either as attachment here in the issue or to my email (d.schulte(AT)uu.nl)? That way I will be able to check what is wrong on my side.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

douweschulte commented 11 months ago

I am afraid attaching files in this way does not work could you try sending it to me directly or adding it to this issue via the webinterface of github?

Ln9052 commented 11 months ago

Thank you for the reminder; I'm feeling troubled. I've noticed that attaching a ".mztab" format file on the GitHub web page is restricted. It seems that GitHub web pages don't support the upload of ".mztab "format files. I'll figure out an alternative way to share the file with you.

Ln9052 commented 11 months ago

image

douweschulte commented 11 months ago

It seems to support gzipped files (.zip/.gz) so you could try to zip it first.

Ln9052 commented 11 months ago

Great! Thanks! Please take a look at this (.zip) filesample_preprocessed_spectra.zip

douweschulte commented 11 months ago

Thanks I will take a look at it!

douweschulte commented 11 months ago

I did not get the same error. But the file contained many empty lines, for which I found a way to generate this output which I hope to have built a fix for. I additionally found that this file contains charge with a decimal point (2.0 instead of 2) which would give an error as well. Both are fixed and the nightly build based on the commit I just made has these fixes applied. If this new version does not work for you let me know and I will take another look. Note: the automated build pipeline takes about 1 hour for all benchmarking to finish, only after that time will the nightly binaries be available.

Ln9052 commented 11 months ago

Thank you very much. I will carefully review your comments and make adjustments based on the actual experience of running the nightly version of Stitch. Thanks again.

Ln9052 commented 10 months ago

Hi Schulte, I found that this nightly version #372 Small mztab issues for #242 #372 did not work for me. This is my batchfile. monoclonal_try_casanovo.txt And this is the error information. Thanks. image

douweschulte commented 10 months ago

The trouble is that the file has a lot of weird newlines ('enters') and these break while reading them in. My test computer runs linux , and there it works, but this stuff works differently on windows. I will try to make a fix in the code itself, but you could also, if you feel comfortable, change the file to not contain these newlines. For this open the file in a text editor, copy the empty line you see there and replace this (control+H in most programs) with nothing. That should make the file run immediately without having to wait for my fix.

douweschulte commented 10 months ago

With some further inspection I see that your file uses \r\r\n as newline pattern, which is extremely weird. What kind of system did you run Casanovo on? This pattern does not work well with the code I wrote to always detect the newline pattern for any file, as this is not a sensible newline pattern. I attached the file where I replaced \r\r\n with \r\n which is the normal newline pattern on windows. This file should work. If there are any more problems feel free to let me know. sample_preprocessed_spectra_normalised.zip

Ln9052 commented 10 months ago

Thank you very much. I am running casanovo on a Windows system, and its result file contains many empty lines. I will attempt to use the file you sent to see if stitch can read it, compare the differences between the two files, and try to identify the reasons for their differences.

douweschulte commented 10 months ago

If you are running Casanovo from windows that could be the reason why it is so weird, then maybe an issue should be raised in Casanovo itself, so they can fix their output.

Ln9052 commented 10 months ago

I have just used the file you provided, and it ran successfully. Thank you for your advice and suggestions. The issue may lie in Casanovo's output.Thank you very much for your assistance.