snijderlab / stitch

Template-based assembly of proteomics short reads for de novo antibody sequencing and repertoire profiling
MIT License
22 stars 3 forks source link

Casanovo local confidence invalid #244

Open Ln9052 opened 8 months ago

Ln9052 commented 8 months ago

When I use Stitch to read the results.zip of pNovo v3.1.5 and run new_PNovo_FLAG_H_20ppm_ALC90.txtan error occurs:image, It seems that Stitch encountered an issue in recognizing the length of the sequence in the pNovo result. In the segment "15903 │ …4966.14966.4.0.dta cGYWRQRWVVRGFCbLNFSSM 0.155942 9.26071 0.427735,0.266975,0.173732,0.131206,0.101629,0.0852…", the length of the sequence is truly 21, but Stitch recognized it as 20 and reported an error, indicating a discrepancy with the number of local confidences. Could you please help to resolve this issue? Thank you.

douweschulte commented 8 months ago

To be perfectly frank with you the parsing of pNovo sequences is a bit patchy. I do not know what they mean with their sequences and the local confidence does not match the length in many a occasion. Because there is no documentation for their output I cannot devise the true meaning of the sequences. So the fix I just made is setting this error to a warning and to ignore the local confidence for the peptides where the length does not match, this is not a proper fix but makes it possible to use the data.

Some more insight in how I understand pNovo peptides, if you know their format a bit better feel free to add to my list (examples use ProForma notation).

  1. In the param file are modified amino acids (can be uppercase/lowercase/digit), eg a = A[mod]
  2. Any modified amino acid at the start indicates an N-terminal modification, meaning the modification stays but the amino acid is ignored, eg aCC => [mod]-CC
  3. Any modified amino acid at the very end has to be ignored in its entirety, eg CCa => CC, CCaa => CCA[mod]
  4. Any modified amino acid in the middle of the sequence is the amino acids and its modification, eg CaC => CA[mod]C

This above set parsed the files I made with pNovo correctly, but it does not seem to work for your file. I tried a couple iterations of these rules (not ignoring the AA for Ntemr, not ignored the Cterm) but I could not find rules that work for all peptides.

MengTingHe2023 commented 8 months ago

Hi, as far as I know, the latest update pNovo(v3.1.5)addressed the issue of cumulative fixed modifications during multiple searches in the GUI interface. Additionally, it fixed the anomaly related to amino acid case sensitivity in searches. The sequences and the local confidence could match the length now. I think you're right not ignoring the AA for Nterm, nor ignored the Cterm. The documentation for their output results can be referenced as follows. You can find this in the pNovo installation package: pNovo 3 User Guide.pdf. 微信图片_20230706110140

douweschulte commented 8 months ago

Thanks for your input! I think I would need to take a full day to dive into the format again, because I never got the lengths of the sequence to match with the length of the local confidences before (3.1.4 I think). So maybe this is fixed with 3.1.5.