What should a standard log of the running process look like?

punching-samuel commented 1 month ago

Hello, I run bamquery

!python xxx/Software/bamquery/BamQuery/BamQuery.py xxx/Software/bamquery/BamQuery try_1 v38_104 --mode normal --t 8 --plots

And my output is as follows,

The output directory already exists in this path. 
BamQuery analysis will continue where it left.
Treatment File : Done!
Reverse Translation : Done!
Alignment : Done!
common_to_modes : Done!

It looks that the process is still running and hasn't finished yet, but my log file hasn't updated at all. My log as follows,

2024-09-14 14:17:39,174 INFO 
2024-09-14 14:17:39,174 INFO 
2024-09-14 14:17:39,174 INFO BamQuery analysis will continue where it left....
2024-09-14 14:17:39,174 INFO Path to input folder : xxx/Software/bamquery/BamQuery/ 
2024-09-14 14:17:39,175 INFO Path to output folder : xxx/Software/bamquery/output/ 
2024-09-14 14:17:39,175 INFO =============== # ===================
2024-09-14 14:17:39,175 INFO =============== Start Parameters ===============
2024-09-14 14:17:39,176 INFO  - BamQuery id : try_1 
2024-09-14 14:17:39,176 INFO  - Mode : normal, Strandedness :  False, Light:  False 
2024-09-14 14:17:39,176 INFO  - Single-Cell experiment (sc) :  False, Count UMIs : False
2024-09-14 14:17:39,176 INFO  - dbSNP :  0, COMMON SNPs : False, Genome Version : v38_104 
2024-09-14 14:17:39,176 INFO  - Plots : True
2024-09-14 14:17:39,176 INFO  - Keep Variant Alignments : False, Keep High Amount Alignments : False
2024-09-14 14:17:39,176 INFO  - Counting overlapping reads : False
2024-09-14 14:17:39,176 INFO  - Mouse Genome : False
2024-09-14 14:17:39,176 INFO  - Threads : 8
2024-09-14 14:17:39,176 INFO =============== End Parameters ===============
2024-09-14 14:17:39,192 INFO Total Bam Files to Query : 8.
2024-09-14 14:17:39,192 INFO Skipping peptide : Peptide because its length. Peptide should be between 8 and 11 aa.
2024-09-14 14:17:39,210 INFO Peptides to evaluate in Peptide Mode : 3617
2024-09-14 14:17:39,211 INFO Peptides to evaluate in Coding Sequence (CS) Mode : 0
2024-09-14 14:17:39,211 INFO Peptides to evaluate in Manual Mode: 0
2024-09-14 14:17:39,211 INFO Total Peptides to evaluate : 3617
2024-09-14 14:17:39,211 INFO ========== Treatment File : Done! ============ 
2024-09-14 14:17:39,214 INFO Fasta file with all the coding sequences already exists in the output path : xxx/Software/bamquery/output/genome_alignments/try_1.fastq --> Skipping this step!
2024-09-14 14:17:39,214 INFO ========== Reverse Translation : Done! ============ 
2024-09-14 14:17:39,401 INFO Alignment file already exists in the output folder : xxx/Software/bamquery/output/genome_alignments//Aligned.out.sam --> Skipping this step!
2024-09-14 14:17:39,401 INFO Alignment information already collected in the output folder : xxx/Software/bamquery/output/alignments//Alignments_information.dic --> Skipping this step!
2024-09-14 14:17:39,403 INFO Total alignments : 1107 
2024-09-14 14:17:39,403 INFO ========== Alignment : Done! ============ 
2024-09-14 14:17:39,403 INFO ========== Common_to_modes : Done! ============ 
2024-09-14 14:17:39,404 INFO Count information already collected in the output folder : xxx/Software/bamquery/output/res/temps_files/try_1_rna_count.csv --> Skipping this step!
2024-09-14 14:17:39,413 INFO ========== Get Count RNA : Done! ============

I will appreciate it if you can upload or show some standard log or the information about the intermediate stages.

punching-samuel commented 1 month ago

And about the '/output/res/temps_files', I checked the try_1_rna_count.csv,

Peptide 2_STAR_sample-1 2_STAR_sample-2 2_STAR_sample-3 2_STAR_sample-4 2_STAR_sample-5 2_STAR_sample-6 1 ASLLDVFVLTR -2 -2 -2 -2 -2 -2 2 ATMELYQISQR -1 -1 -1 -1 -1 -1 3 AVALINAAIQK -1 -1 -1 -1 -1 -1 4 AVIQVSQIVAR -1 -1 -1 -1 -1 -1

All the values in the table are Negative numbers, could you assist me in understanding this situation? I'm looking for clarification and would greatly appreciate your guidance in explaining it further. Thank you in advance for your help.

Samuel.

punching-samuel commented 1 month ago

I select only 50 peptides in peptides.tsv to run bamquery. The output seems correct.

The output directory already exists in this path. 
BamQuery analysis will continue where it left.
Treatment File : Done!
Reverse Translation : Done!
Alignment : Done!
common_to_modes : Done!
Getting counts for  8  samples
***** WARNING: File xxx/Software/bamquery/lib/genome_versions/genome_v38_104/gencode.v38.primary_assembly.annotation.gtf has inconsistent naming convention for record:
GL000009.2  ENSEMBL gene    56140   58376   .   -   .   gene_id "ENSG00000278704.1"; gene_type "protein_coding"; gene_name "ENSG00000278704"; level 3;

***** WARNING: File xxx/Software/bamquery/lib/genome_versions/genome_v38_104/gencode.v38.primary_assembly.annotation.gtf has inconsistent naming convention for record:
GL000009.2  ENSEMBL gene    56140   58376   .   -   .   gene_id "ENSG00000278704.1"; gene_type "protein_coding"; gene_name "ENSG00000278704"; level 3;

========== Genomic and ERE annotation summary by position : Done! ============ 
========== Genomic and ERE annotation by peptide : Done! ============ 
========== Transcription based biotyping by sample: Done! ============ 
========== Transcription based biotyping by group sample: Done! ============ 
========== BamQuery : Done! ============

But I still want to ask that if there sets the limited maximun numbers for peptides in peptides.tsv? My total 3617 peptides have total 1107 alignments, which means nearly 2/3 peptides not aligned? How to optimize the situation? Thank you in advance for your help.

With regards. Samuel.

lemieux-lab / BamQuery

What should a standard log of the running process look like? #6