2 questions - BAM input files and environment location

Hi, Thank you for this awesome program. I have two questions about running it:

Question 1 1) In what folder should I open the "BQ" python environment? I currently have two folders relevant to the analysis:

Folder 1: "opt", which contains a "BamQuery" folder and a "lib" folder. The BamQuery folder was downloaded from GitHub, and the lib folder contains a folder with the reference genome I want to use (m30). Folder 2, in a different location, "anaconda3" which contains all the packages/dependencies relevant to the analysis

My question is where I should open the python environment.

Question 2 2) The methods information page states that two inputs are required - the .tsv file with the list of MAPs, and a set of "BAM" files. I don't understand what the BAM files are supposed to contain. The only input data I have are a list of 4 MAPs (each 8-11 amino acids long) of interest that I wish to map to the mouse genome with BamQuery.

Hi Val,

To answer your question 1. The BQ environment can be created and activated in any folder, mine was created in the ./opt/bamquery parent folder. You just need before running BamQuery to make sure you have installed all the dependencies and also activated the environment. So, once you have activated the environment (BQ) you can run BamQuery.

(BQ) -bash:uger-c002:/BamQuery/Test_downloading_BQ 1029 $ python3 ${INSTALLDIR}/BamQuery/BamQuery.py --help
usage: BamQuery.py [-h] [--mode MODE] [--th_out TH_OUT] [--dbSNP DBSNP] [--c]
                   [--strandedness] [--light] [--sc] [--umi] [--var] [--maxmm]
                   [--overlap] [--plots] [--m] [--dev] [--t T]
                   path_to_input_folder name_exp genome_version

======== BamQuery ========

positional arguments:
  path_to_input_folder  Path to the input folder where to find
                        BAM_directories.tsv and peptides.tsv
  name_exp              BamQuery search Id
  genome_version        Genome human releases : v26_88 / v33_99 / v38_104;
                        Genome mouse releases : M24 / M30

optional arguments:
  -h, --help            show this help message and exit
  --mode MODE           BamQuery search mode : normal / translation
  --th_out TH_OUT       Threshold to assess expression comparation with other
                        tissues
  --dbSNP DBSNP         Human dbSNP : 149 / 151 / 155
  --c                   Take into account the only common SNPs from the dbSNP
                        database chosen
  --strandedness        Take into account strandedness of the samples
  --light               Display only the count and norm count for peptides and
                        regions
  --sc                  Query Single Cell Bam Files
  --umi                 Count UMIs in Single Cell Bam Files
  --var                 Keep Variants Alignments
  --maxmm               Enable STAR to generate a larger number of alignments
  --overlap             Count overlapping reads
  --plots               Plot biotype pie-charts
  --m                   Mouse genome
  --dev                 Save all temps files
  --t T                 Specify the number of processing threads to run
                        BamQuery. The default is 4

To address question 2. The goal of BamQuery is to provide the RNA-seq expression of MAPs in the samples (your own samples) you may be interested in (normal, cancer). This expression is measured from the BAM files (of the samples) that should be provided as the list of BAMs. Therefore, to measure the expression, one of the steps of BamQuery is to align the coding sequences of the MAPs in the mouse genome in order to investigate those genomic locations in the BAM files and retrieve the expression information. In other words, mapping the MAP coding sequences to the mouse genome is not the goal of BamQuery, but a by-product. I hope this helps !

lemieux-lab / BamQuery

2 questions - BAM input files and environment location #2