Open sundysun99 opened 1 year ago
Hi Sundy,
I have a suggestion and also some questions for you.
First of all, I noticed that you are querying 613 peptides and only for 200 of them you have found alignments in the genome.
So I suggest you include in the search the parameters --var
and --maxmm
, this will hopefully help to find locations for the missing peptides.
I searched for your peptides and when I am not using the two parameters 6/7 peptides are missed but when I add those two parameters only 1 peptide is missed. So the parameters should look like this:
2023-11-02 21:23:29,897 INFO =============== Start Parameters ===============
2023-11-02 21:23:29,897 INFO - BamQuery id : mouse_example
2023-11-02 21:23:29,897 INFO - Mode : normal, Strandedness : False, Light: False
2023-11-02 21:23:29,897 INFO - Single-Cell experiment (sc) : False, Count UMIs : False
2023-11-02 21:23:29,897 INFO - dbSNP : mouse_GRCm38, COMMON SNPs : False, Genome Version : M24
2023-11-02 21:23:29,897 INFO - Plots : True
2023-11-02 21:23:29,897 INFO - Keep Variant Alignments : True, Keep High Amount Alignments : True
2023-11-02 21:23:29,897 INFO - Counting overlapping reads : False
2023-11-02 21:23:29,897 INFO - Mouse Genome : True
2023-11-02 21:23:29,897 INFO - Threads : 4
2023-11-02 21:23:29,897 INFO =============== End Parameters ===============
Also, I wanted to make sure that the bam files you are querying the peptides in were aligned in the murine GRCm38 genome.
Lastly, did you get any errors? It looks like BamQuery didn't finish correctly, that's why you don't have the biotype classification, but the console should indicate if there were any errors that prevented BamQuery from finishing.
Hi, thank you for reply.
(1) yes, murine GRCm38 genome is the ref I use (2)I tried two sets of bam files, the first one I added multiple custom parameters when aligned with STAR, and the second one I didn't add any parameters, and neither of them output the biotype inference. It is assumed that the bam file format is the cause of the error. Seems there is no mention of bam file requirements in the tutorial. Is there some information I'm missing? (3) --var and --maxmm were used, but still 200/613 could successfully aligned. but when I try different bam (STAR run with no additional parameters), the aligned peptide number changed to 500+ .
Here is the error I got (bam with multiple parameters by STAR)
Traceback (most recent call last): File "/xx/bamquery/BamQuery/BamQuery.py", line 646, in <module> main(sys.argv[1:]) File "/xx/bamquery/BamQuery/BamQuery.py", line 604, in main BamQuery(path_to_input_folder, path_to_output_folder, name_exp, mode, strandedness, th_out, light, dev, plots, dbSNP, c, super_logger, bam_files_logger, sc, umi, var, maxmm, genome_version, overlap, mouse, t) File "/xx/bamquery/BamQuery/BamQuery.py", line 53, in __init__ self.run_bam_query_normal_mode(bam_files_logger) File "/xx/bamquery/BamQuery/BamQuery.py", line 103, in run_bam_query_normal_mode plots.get_heat_map(df_counts_rna, self.path_to_output_folder+'plots/heat_maps/transcription_evidence_heatmap/', self.mode, path_temps_file, self.name_exp, '_rna_counts', False, self.th_out) File "/xx/bamquery/BamQuery/plotting/plots.py", line 85, in get_heat_map tissue = bam_files_info_query[sample][6] KeyError: 'Position'
another try with different bam (STAR run with no additional parameters), the error is
Traceback (most recent call last): File "/xx/bamquery/BamQuery/genomics/normalization.py", line 61, in get_normalization info_bam_file = dictionary_total_reads_bam_files[name_bam_file] KeyError: 'Position' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/xx/bamquery/BamQuery/BamQuery.py", line 646, in main(sys.argv[1:]) File "/xx/bamquery/BamQuery/BamQuery.py", line 604, in main BamQuery(path_to_input_folder, path_to_output_folder, name_exp, mode, strandedness, th_out, light, dev, plots, dbSNP, c, super_logger, bam_files_logger, sc, umi, var, maxmm, genome_version, overlap, mouse, t) File "/xx/bamquery/BamQuery/BamQuery.py", line 53, in init self.run_bam_query_normal_mode(bam_files_logger) File "/xx/bamquery/BamQuery/BamQuery.py", line 108, in run_bam_query_normal_mode def_norm_rna = normalization.get_normalization(df_counts_rna, '_rna_norm.csv') File "/xx/bamquery/BamQuery/genomics/normalization.py", line 84, in get_normalization raise Exception("\nBefore to continue you need to verify that the primary read count for the bam file "+name_bam_file+" is already included in the dictionary. To do so: verify in the log Get_Read_Count_BAM_directories.log that the primary read count processes have finished. Please re-launch BamQuery, once all the primary read counts have been included." ) Exception: Before to continue you need to verify that the primary read count for the bam file Position is already included in the dictionary. To do so: verify in the log Get_Read_Count_BAM_directories.log that the primary read count processes have finished. Please re-launch BamQuery, once all the primary read counts have been included.
Then I check the code, my bam_files_info_query.dic file is
>>> bam_files_info_query {'bam_file_sample-1_all': ['/xx/bam_file/sample-1_all/2stpass_sample-1_allAligned.sortedByCoord.out.bam', 'unstranded', 'unstranded', '0_treat_1', 83803720, 'R1', '0h'], 'bam_file_sample-2_all': ['/xx/bam_file/sample-2_all/2stpass_sample-2_allAligned.sortedByCoord.out.bam', 'unstranded', 'unstranded', '0_treat_2', 73483150, 'R2', '0h'], 'bam_file_sample-3_all': ['/xx/bam_file/sample-3_all/2stpass_sample-3_allAligned.sortedByCoord.out.bam', 'unstranded', 'unstranded', '0_treat_3', 73895430, 'R3', '0h'], 'bam_file_sample-4_all': ['/xx/bam_file/sample-4_all/2stpass_sample-4_allAligned.sortedByCoord.out.bam', 'unstranded', 'unstranded', '48_treat_1', 70786450, 'R1', '48h'], 'bam_file_sample-5_all': ['/xx/bam_file/sample-5_all/2stpass_sample-5_allAligned.sortedByCoord.out.bam', 'unstranded', 'unstranded', '48_treat_2', 75071165, 'R2', '48h'], 'bam_file_sample-6_all': ['/xx/bam_file/sample-6_all/2stpass_sample-6_allAligned.sortedByCoord.out.bam', 'unstranded', 'unstranded', '48_treat_3', 77277771, 'R3', '48h']}
Hi Sundy,
I'm a little confused with your answers. So I think it would be easier to address the problem if we schedule a call, so we can make sure we talk about the same issues. Could you write to me at: maria.virginia.ruiz.cuevas@umontreal.ca ?
Thank you!
Hi, I am trying to use this software to analyze my MS data. But I'm not sure if the calculation is finished, especially since none of the biotype related results are output.
Here are my input, BAM_directories.tsv:
peptides.tsv:(600+ peptide)
And log file is