lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Unspecific HLA datasets #99

Closed karimwh closed 8 months ago

karimwh commented 8 months ago

Hello Michael,

I had an issue trying to use sage with HLA datasets. I want the cleavage to be unspecific, which based on the documentation it should be left empty. The process is killed afterwords.

Another thing is the generation of decoys. In the protein column is it possible to have a peptide that is occurring in say two decoy proteins and one target ? Thanks in advance.

Best, Karim Abdelfattah

karimwh commented 8 months ago

One more questions is the calculated mass including the modifications or with out them ?

lazear commented 8 months ago

Hi Karim,

The process is killed due to lack of RAM - fragment indexing requires the pre-generation of every single b/y ion for every peptide in the search space... This gets quite large for unspecific searches! I am going to refer you to https://github.com/lazear/sage/issues/97#issuecomment-1781969281 where a work-around python script is presented for splitting a FASTA file into chunks (reducing the search space) and running Sage multiple times.

For decoy generation, decoy sequences that also appear in the target database are removed - so a peptide should never be shared across both decoy proteins and a target protein. If you find a case where this happens, please let me know since it is a bug.

Calculated mass includes any specified variable or static mods present in the peptide sequence, but will not include any open modification search masses.

karimwh commented 8 months ago

Thank you so much for your help. Regarding the peptide sequence shared between proteins, I checked earlier today, and didn't find such an occurrence.