siheming / mspypeline

Package to analyze Mass Spec Data
MIT License
11 stars 11 forks source link

Can't find saved plot files #35

Closed ilivyatan closed 1 year ago

ilivyatan commented 1 year ago

Hi, I've managed to run the mspypeline GUI on a /txt folder created by MaxQuant. The "Create report" works and generates a PDF in the directory just above the /txt But when I ask for a figure to be generated, the DEBUG output indicates that it was successful, yet I can't find any figures... What am I doing wrong?

The output looks like this: (mspypeline) [ilivya@access3 Proteomics]$ 2023-01-18 15:17:12,328 - MSPInitializer - DEBUG - Updating yml settings file 2023-01-18 15:17:12,337 - MQReader - INFO - Required files: ['proteinGroups.txt'] 2023-01-18 15:17:12,337 - MQReader - DEBUG - Got configs: ordereddict([('all_replicates', ['Rev-2-T-6 1', 'Rev-2-T-6 2', 'Rev-2-T-6 3', 'T-25 Adh 1', 'T-25 Adh 2', 'T-25 Adh 3', 'HU-1 1', 'HU-1 2', 'HU-1 3', 'T-63 1', 'T-63 2', 'T-63 3']), ('analysis_design', ordereddict([('Rev-2-T-6 1', 'Rev-2-T-6 1'), ('Rev-2-T-6 2', 'Rev-2-T-6 2'), ('Rev-2-T-6 3', 'Rev-2-T-6 3'), ('T-25 Adh 1', 'T-25 Adh 1'), ('T-25 Adh 2', 'T-25 Adh 2'), ('T-25 Adh 3', 'T-25 Adh 3'), ('HU-1 1', 'HU-1 1'), ('HU-1 2', 'HU-1 2'), ('HU-1 3', 'HU-1 3'), ('T-63 1', 'T-63 1'), ('T-63 2', 'T-63 2'), ('T-63 3', 'T-63 3')])), ('levels', 1), ('level_names', [0])]) 2023-01-18 15:17:12,346 - MSPInitializer - INFO - Reading pathway and GO list of interest 2023-01-18 15:17:12,346 - MSPInitializer - DEBUG - Updating yml settings file 2023-01-18 15:17:12,354 - MSPInitializer - DEBUG - Found config dir 2023-01-18 15:17:12,354 - MSPInitializer - DEBUG - Found config.yml file in config dir 2023-01-18 15:17:12,354 - MSPInitializer - DEBUG - Found config dir 2023-01-18 15:17:12,354 - MSPInitializer - DEBUG - Found config.yml file in config dir 2023-01-18 15:17:12,354 - MSPInitializer - DEBUG - yml file location: /home/labs/straussman/ilivya/Proteomics/config/config.yml 2023-01-18 15:17:12,354 - MSPInitializer - INFO - loading yml file 2023-01-18 15:17:12,372 - MSPInitializer - DEBUG - Config file contents: ordereddict([('selected_reader', 'mqreader'), ('selected_normalizer', 'quantile_norm_missing_handled'), ('has_techrep', True), ('use_protein_id', False), ('equal_variance', False), ('pathways', ['BIOCARTA_EPO_PATHWAY.txt']), ('go_terms', []), ('mqreader', ordereddict([('all_replicates', ['Rev-2-T-6 1', 'Rev-2-T-6 2', 'Rev-2-T-6 3', 'T-25 Adh 1', 'T-25 Adh 2', 'T-25 Adh 3', 'HU-1 1', 'HU-1 2', 'HU-1 3', 'T-63 1', 'T-63 2', 'T-63 3']), ('analysis_design', ordereddict([('Rev-2-T-6 1', 'Rev-2-T-6 1'), ('Rev-2-T-6 2', 'Rev-2-T-6 2'), ('Rev-2-T-6 3', 'Rev-2-T-6 3'), ('T-25 Adh 1', 'T-25 Adh 1'), ('T-25 Adh 2', 'T-25 Adh 2'), ('T-25 Adh 3', 'T-25 Adh 3'), ('HU-1 1', 'HU-1 1'), ('HU-1 2', 'HU-1 2'), ('HU-1 3', 'HU-1 3'), ('T-63 1', 'T-63 1'), ('T-63 2', 'T-63 2'), ('T-63 3', 'T-63 3')])), ('levels', 1), ('level_names', [0])])), ('plot_normalization_overview_all_normalizers_settings', ordereddict([('create_plot', False), ('dfs_to_use', ['lfq_log2']), ('levels', [])])), ('plot_heatmap_overview_all_normalizers_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_detection_counts_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_detected_proteins_per_replicate_settings', ordereddict([('create_plot', False), ('dfs_to_use', ['lfq_log2']), ('levels', [])])), ('plot_venn_results_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_venn_groups_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_pca_overview_settings', ordereddict([('create_plot', True), ('dfs_to_use', ['lfq_log2']), ('levels', []), ('no_missing_values', True)])), ('plot_intensity_histograms_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_relative_std_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_scatter_replicates_settings', ordereddict([('create_plot', False), ('dfs_to_use', ['lfq_log2']), ('levels', [])])), ('plot_experiment_comparison_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_rank_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_pathway_analysis_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_go_analysis_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', [])])), ('plot_r_volcano_settings', ordereddict([('create_plot', False), ('dfs_to_use', []), ('levels', []), ('adj_pval', False)]))]) 2023-01-18 15:17:12,396 - MQReader - DEBUG - Reading proteinGroups from disk 2023-01-18 15:17:12,563 - MQReader - DEBUG - Removing 225 rows from proteinGroups.txt because they are marked as contaminant 2023-01-18 15:17:13,190 - MQReader - DEBUG - Shape of df_contaminants before masking (5056, 151) 2023-01-18 15:17:13,198 - MQReader - DEBUG - Shape of df_contaminants after masking (5037, 151) 2023-01-18 15:17:13,199 - MQReader - INFO - Setting index of proteinGroups.txt to Gene name 2023-01-18 15:17:13,205 - MQReader - WARNING - Found duplicates in Gene name column. Duplicate names: KTN1, PSMC3, MSH3, KTN1, TUBA4A, TUBA4A, MBNL1, UBAP2L, CNBP, EEF1D, LAT, ATXN2L, LAT, PICALM, PRMT1, PRMT1, CLASP2, MYL6, MYL6, HNRNPC, PCBP2, LARP4, HMGA1, AAK1, UNC45A, CAPZB, PPIH, 0610010K14RIK, ALDOA, NAA10, HNRNPK, PCBP2, UNC45A, 0610010K14RIK, GLS, CTC1, RBM26, GIT2, SH3KBP1, SF1, TPM3, RBM26, P4HA1, NUMA1, TPM3, CLASP2, VDAC1, GLS, SF1, GLS, ARHGEF6, NUMA1, PSMD4, ANXA6, ERH, USP9X, H2-D1, DNM2, MBNL1, LARP4, EIF4G2, ARHGEF6, DNMT1, HBS1L, METAP2, METAP2, PSMD4, CALU, PSMC3, H2-D1, ALDOA, MSH3, DNMT1, ANXA6, HMGA1, DNM2, CAPZB, CNBP, EEF1D, HNRNPK, RAB1A, USP9X, ERH, ATXN2L, HBS1L, NDUFV3, AAK1, NAA10, CTC1, RAB1A, P4HA1, VDAC1, TMPO, TMPO, PCBP2, EIF4G2, CALU, PICALM, UBAP2L, NDUFV3, TARDBP, SH3KBP1, TARDBP, PPIH, GIT2, HNRNPC 2023-01-18 15:17:13,208 - MQReader - WARNING - Merging 106 rows into 52 by summing numerical columns. Some information might be incorrect 2023-01-18 15:17:13,329 - MQReader - DEBUG - proteinGroups.txt shape after preprocessing: (4983, 151) 2023-01-18 15:17:13,330 - MaxQuantPlotter - DEBUG - Adding option raw and raw_log2 2023-01-18 15:17:13,649 - MaxQuantPlotter - DEBUG - Adding option raw_quantile_norm_missing_handled and raw_quantile_norm_missing_handled_log2 2023-01-18 15:17:13,926 - MaxQuantPlotter - DEBUG - Adding option raw_normalized and raw_normalized_log2 2023-01-18 15:17:13,951 - MaxQuantPlotter - DEBUG - Adding option lfq and lfq_log2 2023-01-18 15:17:14,195 - MaxQuantPlotter - DEBUG - Adding option lfq_quantile_norm_missing_handled and lfq_quantile_norm_missing_handled_log2 2023-01-18 15:17:14,431 - MaxQuantPlotter - DEBUG - Adding option lfq_normalized and lfq_normalized_log2 2023-01-18 15:17:14,452 - MaxQuantPlotter - DEBUG - Adding option ibaq and ibaq_log2 2023-01-18 15:17:14,743 - MaxQuantPlotter - DEBUG - Adding option ibaq_quantile_norm_missing_handled and ibaq_quantile_norm_missing_handled_log2 2023-01-18 15:17:15,036 - MaxQuantPlotter - DEBUG - Adding option ibaq_normalized and ibaq_normalized_log2 2023-01-18 15:17:15,059 - MaxQuantPlotter - DEBUG - got global settings: {} 2023-01-18 15:17:15,059 - MaxQuantPlotter - DEBUG - creating plot plot_pca_overview 2023-01-18 15:17:15,060 - MaxQuantPlotter - INFO - Done creating plots

siheming commented 1 year ago

hwy @ilivyatan Looking at the log I would assume that the experiment design that was inferred is wrong, since all rows seem to be in the csv twice. could you maybe provide the yml file created by mspypeline and a header of your max quant data? Otherwise you could try to manually fix the design based on the information here: https://mspypeline.readthedocs.io/en/latest/settings_and_configuration.html#analysis-design

ilivyatan commented 1 year ago

Hi, Thanks for the quick reply! I've attached the .yml. I'm new to this tool, so I may have some stupid startup questions. I'm not sure what you mean by the experiment design. I've attached the Groups.txt. The proteomics team did some analysis with MaxQuant, but I wanted to delve deeper with your software. I've attached the report that your tool can create. It just seems stuck with the plots.

I'll look into the analysis design document you suggested. Thanks, Ilana


From: siheming @.> Sent: Thursday, January 19, 2023 10:32 AM To: siheming/mspypeline @.> Cc: Ilana Livyatan @.>; Mention @.> Subject: Re: [siheming/mspypeline] Can't find saved plot files (Issue #35)

hwy @ilivyatanhttps://github.com/ilivyatan Looking at the log I would assume that the experiment design that was inferred is wrong, since all rows seem to be in the csv twice. could you maybe provide the yml file created by mspypeline and a header of your max quant data? Otherwise you could try to manually fix the design based on the information here: https://mspypeline.readthedocs.io/en/latest/settings_and_configuration.html#analysis-design

— Reply to this email directly, view it on GitHubhttps://github.com/siheming/mspypeline/issues/35#issuecomment-1396611595, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL6HB5VDFXJIA375UG3PMGDWTD3ZJANCNFSM6AAAAAAT7COTXY. You are receiving this because you were mentioned.Message ID: @.***>

Name Allgroups LFQ intensity HU-1 1 LFQ intensity HU-1 LFQ intensity HU-1 2 LFQ intensity HU-1 LFQ intensity HU-1 3 LFQ intensity HU-1 LFQ intensity Rev-2-T-6 1 LFQ intensity Rev-2-T-6 LFQ intensity Rev-2-T-6 2 LFQ intensity Rev-2-T-6 LFQ intensity Rev-2-T-6 3 LFQ intensity Rev-2-T-6 LFQ intensity T-25 Adh 1 LFQ intensity T-25 Adh LFQ intensity T-25 Adh 2 LFQ intensity T-25 Adh LFQ intensity T-25 Adh 3 LFQ intensity T-25 Adh LFQ intensity T-63 1 LFQ intensity T-63 LFQ intensity T-63 2 LFQ intensity T-63 LFQ intensity T-63 3 LFQ intensity T-63

siheming commented 1 year ago

Hey Ilana, attaching the file yaml via email seems to not work. Please copy paste the contents of the file to the message or post on github directly. I am not really sure what the Allgroups are, I have not worked in the Bioinformatics side on quite a bit. But juding from the names of the different columns my tool cannot infer what you want to compare with what. You will probably need a sample mapping file as described in the analysis design wiki.

siheming commented 1 year ago

Hey, since I have not heared back I will close this for now, but feel free to open it if you have additional questions