lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Semi-enzymatic digests not completing? #106

Open hbarsnes opened 7 months ago

hbarsnes commented 7 months ago

Seems like semi-enzymatic digests are not completing?

At least that is what happens if I try the following settings:

        "enzyme": {
            "missed_cleavages": 2,
            "min_len": 8,
            "max_len": 30,
            "cleave_at": "RK",
            "restrict": "P",
            "c_terminal": true,
            "semi_enzymatic": true
        },

If I remove the last line or replace it with "semi_enzymatic": null the search completes and is as fast as before.

I'm using the default SearchGUI/PeptideShaker example dataset/input, hence there should be no unexpected issues there.

Any idea what is happening?

lazear commented 7 months ago

At the moment, semi-enzymatic is kinda unusable unless you have a ton of RAM. Fragment indexing necessitates pre-digesting every peptide and generating every fragment... which is pretty resource intensive for semi-enzymatic (or no-enzyme).

I am working on an internal database splitting solution to at least partially alleviate the problem (and hopefully improve it over time). In the mean time, you can confirm that it works by either reducing the # of missed cleavages, or using a significantly smaller FASTA file (or processing FASTA database in chunks: https://github.com/lazear/sage/issues/97#issuecomment-1781969281)