kusterlab / curve_curator

Analysis platform for large-scale dose-dependent data
Apache License 2.0
17 stars 3 forks source link

CurveCurator #21

Closed grbergeron closed 5 months ago

grbergeron commented 5 months ago

Hi I am unsure of how to interpret this error or what to do to fix it.

Uncaught exception

Traceback (most recent call last): File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc return self._engine.get_loc(casted_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Modified sequence'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users.conda\envs\CurveCuratorEnv\Scripts\CurveCurator.exe__main.py", line 7, in sys.exit(main()) ^^^^^^ File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator__main__.py", line 99, in main data = data_parser.load(config) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator\data_parser.py", line 427, in load df = load_mq_tmt_peptides(path, search_engine_version, unique_cols=unique_cols, sum_cols=raw_cols, first_cols=first_cols, max_cols=max_cols) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\curve_curator\data_parser.py", line 156, in load_mq_tmt_peptides df['Modified sequence'] = clean_modified_sequence(df['Modified sequence']) ~~^^^^^^^^^^^^^^^^^^^^^ File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\frame.py", line 4090, in getitem__ indexer = self.columns.get_loc(key) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users.conda\envs\CurveCuratorEnv\Lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc raise KeyError(key) from err KeyError: 'Modified sequence'

FloBay commented 5 months ago

Hey..

it seams your are running CurveCurator in TMT-Peptide mode based on MaxQuant output. Here CurveCurator expects the evidence.txt file as input. This input file should have a column called 'Modified sequence' (case sensitive). However CurveCurator cannot find this specific column name in the file your provided. That is why it says KeyError: 'Modified sequence'

May I ask what type of data you have? I suspect that you have different data than TMT-Peptide searched with MQ.

Best Flo

grbergeron commented 5 months ago

Hi Flo, Thank you for your prompt response!!

I was trying to test the system by using the data provided in the paper. That's when I came across this error.

I am working with Spectronaut DIA data and was hoping to figure out a way to make it compatible with CurveCurator. Any suggestions on how I can accomplish this?

FloBay commented 5 months ago

Which data-set toml combination did you try to reproduce exactly? The example files on GitHub should all work. I will try to reproduce your error from the example files.

Currently, CurveCurator does not have a built-in Spectonaut parser. We only have a DIANN parser at the moment. But I am generally interested in providing one in a future release. I will try to find a few DIA files and push them through Spectonaut to see the file format that they output for protein and peptide data.

In the meantime, you can always use the generic data upload. For this you need to specify in the TOML file:

measurement_type= 'OTHER'
data_type = 'OTHER' 
search_engine = 'OTHER'
Your input_data.txt file then must have the following structure, where xxx is your MS intensity. The 1 to N is the experiment name you provide in the toml file. Name Raw 1 ... Raw N
Protein_1 xxxx ... xxxx
Protein_2 xxxx ... xxxx
Protein_3 xxxx ... xxxx

I recommend starting with the minimal parameter toml file and then slowly adding filters / other options until you find a setup that fits your data.

Best Flo

grbergeron commented 5 months ago

May I ask if it would be okay to chat via email?

FloBay commented 5 months ago

sure..