mhalushka / miRge3.0

Comprehensive analysis of small RNA sequencing data
MIT License
27 stars 12 forks source link

Different "miRge3.0 --help" #30

Closed schmucr1 closed 2 years ago

schmucr1 commented 2 years ago

Hello

I installed the latest version of miRge3.0

miRge3.0 --version
3.0

into a conda environment. However, when I look at the help/documentation I see different output than on the documentation website. Here is what I see

miRge3.0 --help
usage: miRge3.0 [options]

miRge3.0 (Comprehensive analysis of small RNA sequencing Data)

optional arguments:
  -h, --help  show this help message and exit
  --version   show program's version number and exit

Options:
  -s,    --samples            list of one or more samples separated by comma or a file with list of samples separated by new line (accepts *.fastq, *.fastq.gz) 
  -db,   --mir-DB             the reference database of miRNA. Options: miRBase and miRGeneDB (Default: miRBase) 
  -lib,  --libraries-path     the path to miRge libraries 
  -on,   --organism-name      the organism name can be human, mouse, fruitfly, nematode, rat or zebrafish
  -ex,   --crThreshold        the threshold of the proportion of canonical reads for the miRNAs to retain. Range for ex (0 - 0.5), (Default: 0.1)
  -phr,  --phred64            phred64 format (Default: 33)
  -spk,  --spikeIn            switch to annotate spike-ins if spike-in bowtie index files are located at the path of bowtie's index files (Default: off)
  -ie,   --isoform-entropy    switch to calculate isomir entropy (default: off)
  -cpu,  --threads            the number of processors to use for trimming, qc, and alignment (Default: 1)
  -ai,   --AtoI               switch to calculate A to I editing (Default: off)
  -tcf   --tcf-out            switch to write trimmed and collapsed fasta file (Default: off)
  -gff   --gff-out            switch to output isomiR results in gff format (Default: off) 
  -bam   --bam-out            switch to output results in bam format (Default: off) 
  -trf   --tRNA-frag          switch to analyze tRNA fragment and halves (Default: off)
  -o     --outDir             the directory of the outputs (Default: current directory) 
  -dex   --diffex             perform differential expression with DESeq2 (Default: off)
  -mdt   --metadata           the path to metadata file (Default: off, require '.csv' file format if -dex is opted)
  -cms   --chunkmbs           chunk memory in megabytes per thread to use during bowtie alignment (Default: 256)
  -shh   --quiet              enable quiet/silent mode, only show warnings and errors (Default: off)

Data pre-processing:
  -a,    --adapter            Sequence of a 3' adapter. The adapter and subsequent bases are trimmed
  -g,    --front              Sequence of a 5' adapter. The adapter and any preceding bases are trimmed
  -u,    --cut                Remove bases from each read. If LENGTH is positive, remove bases from the beginning. If LENGTH is negative, remove bases from the end
  -nxt,  --nextseq-trim       NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases
  -q,    --quality-cutoff     Trim low-quality bases from 5' and/or 3' ends of each read before adapter removal. If one value is given, only the 3' end is trimmed
                              If two comma-separated cutoffs are given, the 5' end is trimmed with the first cutoff, the 3' end with the second
  -l,    --length             Shorten reads to LENGTH. Positive values remove bases at the end while negative ones remove bases at the beginning. This and the following
                              modifications are applied after adapter trimming
  -NX,   --trim-n             Trim N's on ends of reads
  -m,    --minimum-length     Discard reads shorter than LEN. (Default: 16)
 of time)
  -mEC,  --miREC              Enable miRNA error correction (miREC)

Do you have any ideas?

Thanks a lot and best wishes, Roland

arunhpatil commented 2 years ago

Hi @schmucr1,

Can you type the below command and let me know if the release is 0.0.9 or different? $conda list | grep "mirge"

Thank you, Arun

schmucr1 commented 2 years ago

Dear Arun

Thank you for your rapid answer. Here is the result of the command:

conda activate /scratch/site/u/schmucr1/envs/mirge3
(mirge3) $ conda list | grep "mirge"
# packages in environment at /scratch/site/u/schmucr1/envs/mirge3:
mirge3                    0.0.9              pyh5e36f6f_0    bioconda

So, it seems to be the same version.

I also looked for a docker image containing mirge3 and found this one here "gcfntnu/mirge3:0.0.9"

But it seems to have the same issue: the version is also 0.0.9

singularity run docker://docker-cache.repository.intranet.roche.com/gcfntnu/mirge3:0.0.9 conda list | grep "mirge"
INFO:    Using cached SIF image
# packages in environment at /scratch/site/u/schmucr1/envs/mirge3:
mirge3                    0.0.9              pyh5e36f6f_0    bioconda

and the help is the same as in my installation but different to the documentation

singularity run docker://docker-cache.repository.intranet.roche.com/gcfntnu/mirge3:0.0.9 miRge3.0 --help
INFO:    Using cached SIF image
usage: miRge3.0 [options]

miRge3.0 (Comprehensive analysis of small RNA sequencing Data)

optional arguments:
  -h, --help  show this help message and exit
  --version   show program's version number and exit

Options:
  -s,    --samples            list of one or more samples separated by comma or a file with list of samples separated by new line (accepts *.fastq, *.fastq.gz) 
  -db,   --mir-DB             the reference database of miRNA. Options: miRBase and miRGeneDB (Default: miRBase) 
  -lib,  --libraries-path     the path to miRge libraries 
  -on,   --organism-name      the organism name can be human, mouse, fruitfly, nematode, rat or zebrafish
  -ex,   --crThreshold        the threshold of the proportion of canonical reads for the miRNAs to retain. Range for ex (0 - 0.5), (Default: 0.1)
  -phr,  --phred64            phred64 format (Default: 33)
  -spk,  --spikeIn            switch to annotate spike-ins if spike-in bowtie index files are located at the path of bowtie's index files (Default: off)
  -ie,   --isoform-entropy    switch to calculate isomir entropy (default: off)
  -cpu,  --threads            the number of processors to use for trimming, qc, and alignment (Default: 1)
  -ai,   --AtoI               switch to calculate A to I editing (Default: off)
  -tcf   --tcf-out            switch to write trimmed and collapsed fasta file (Default: off)
  -gff   --gff-out            switch to output isomiR results in gff format (Default: off) 
  -bam   --bam-out            switch to output results in bam format (Default: off) 
  -trf   --tRNA-frag          switch to analyze tRNA fragment and halves (Default: off)
  -o     --outDir             the directory of the outputs (Default: current directory) 
  -dex   --diffex             perform differential expression with DESeq2 (Default: off)
  -mdt   --metadata           the path to metadata file (Default: off, require '.csv' file format if -dex is opted)
  -cms   --chunkmbs           chunk memory in megabytes per thread to use during bowtie alignment (Default: 256)
  -shh   --quiet              enable quiet/silent mode, only show warnings and errors (Default: off)

Data pre-processing:
  -a,    --adapter            Sequence of a 3' adapter. The adapter and subsequent bases are trimmed
  -g,    --front              Sequence of a 5' adapter. The adapter and any preceding bases are trimmed
  -u,    --cut                Remove bases from each read. If LENGTH is positive, remove bases from the beginning. If LENGTH is negative, remove bases from the end
  -nxt,  --nextseq-trim       NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases
  -q,    --quality-cutoff     Trim low-quality bases from 5' and/or 3' ends of each read before adapter removal. If one value is given, only the 3' end is trimmed
                              If two comma-separated cutoffs are given, the 5' end is trimmed with the first cutoff, the 3' end with the second
  -l,    --length             Shorten reads to LENGTH. Positive values remove bases at the end while negative ones remove bases at the beginning. This and the following
                              modifications are applied after adapter trimming
  -NX,   --trim-n             Trim N's on ends of reads
  -m,    --minimum-length     Discard reads shorter than LEN. (Default: 16)
 of time)
  -mEC,  --miREC              Enable miRNA error correction (miREC)

If you need further information, let me know please.

Thank you! R.

arunhpatil commented 2 years ago

@schmucr1,

This is interesting, can you try a few things mentioned below and if the results are the same then probably the installation is not complete. (Also, I was doubtful that conda environment name and package name may be the conflict but it is not)

conda activate mirge3 miRge3.0 -h | tail -30 (tail -30 command gives the last 30 lines of the miRge3.0 help command) Please run a small file test file as described here, if you get any error please let us know. If the last 30 lines are not as is in the documentation and the test run is not successful then let's troubleshoot any error you may get and in the worst case, we will reinstall miRge3.0 by the following command:

conda activate mirge3
conda remove mirge3
conda install -c bioconda mirge3

What operating system are you on and can you send the packages installed in this conda environment like below:

conda list > env_packages.txt

Thank you, Arun.

schmucr1 commented 2 years ago

Dear @arunhpatil

Thank you for your suggestions. I did the following,

conda activate  /scratch/site/u/schmucr1/envs/mirge3
miRge3.0 -h | tail -30
  -m,    --minimum-length     Discard reads shorter than LEN. (Default: 16)

and got only one line. That is weird. I am using a bash shell in a browser window. Then, I tried the same in a fresh Putty terminal, and got the correct output, all 30 lines:

 miRge3.0 -h | tail -30
  -m,    --minimum-length     Discard reads shorter than LEN. (Default: 16)
  -umi,  --uniq-mol-ids       Trim nucleotides of specific length at 5’ and 3’ ends of the read, after adapter trimming. eg: 4,4 or 0,4. (Use -udd to remove PCR duplicates)
  -udd,  --umiDedup           Specifies argument to removes PCR duplicates (Default: False); if TRUE it will remove UMI and remove PCR duplicates otherwise it only remove UMI and keep the raw counts (Require -umi option)
  -qumi, --qiagenumi          Removes PCR duplicates of reads obtained from Qiagen platform (Default: Illumina; "-umi x,y " Required)

miRNA Error Correction:
  microRNA correction method for single base substitutions due to sequencing errors (Note: Refines reads at the expense of time)
  -mEC,  --miREC              Enable miRNA error correction (miREC)
  -kh,   --threshold          the value for frequency threshold τ (Default kh = 5)
  -ks,   --kmer-start         kmer range start value (k_1, default 15)
  -ke,   --kmer-end           kmer range end value (k_end, default 20)

Predicting novel miRNAs:
  The predictive model for novel miRNA detection is trained on human and mouse!
  -nmir, --novel-miRNA        include prediction of novel miRNAs
  -minl, --minLength          the minimum length of the retained reads for novel miRNA detection (default: 16)
  -maxl, --maxLength          the maximum length of the retained reads for novel miRNA detection (default: 25)
  -c,    --minReadCounts      the minimum read counts supporting novel miRNA detection (default: 2)
  -mloc, --maxMappingLoci     the maximum number of mapping loci for the retained reads for novel miRNA detection (default: 3)
  -sl,   --seedLength         the seed length when invoking Bowtie for novel miRNA detection (default: 25)
  -olc,  --overlapLenCutoff   the length of overlapped seqence when joining reads into longer sequences based on the coordinate
                              on the genome for novel miRNA detection (default: 14)
  -clc,  --clusterLength      the maximum length of the clustered sequences for novel miRNA detection (default: 30)

Optional PATH arguments:
  -pbwt, --bowtie-path        the path to system's directory containing bowtie binary
  -psam, --samtools-path      the path to system's directory containing samtools binary
  -prf,  --RNAfold-path       the path to system's directory containing RNAfold binary

Thus, I think the installation is good. I need to figure out with my local IT support the reason for the weird behaviour in the browser terminal.

Many thanks for your kind help and best wishes, R.

PS: Maybe there is some specical characters in the help, because when I do display only the last 19 lines it seems to work properly,

  -ks,   --kmer-start         kmer range start value (k_1, default 15) 
  -ke,   --kmer-end           kmer range end value (k_end, default 20)

Predicting novel miRNAs:
  The predictive model for novel miRNA detection is trained on human and mouse!
  -nmir, --novel-miRNA        include prediction of novel miRNAs
  -minl, --minLength          the minimum length of the retained reads for novel miRNA detection (default: 16)
  -maxl, --maxLength          the maximum length of the retained reads for novel miRNA detection (default: 25)
  -c,    --minReadCounts      the minimum read counts supporting novel miRNA detection (default: 2)
  -mloc, --maxMappingLoci     the maximum number of mapping loci for the retained reads for novel miRNA detection (default: 3)
  -sl,   --seedLength         the seed length when invoking Bowtie for novel miRNA detection (default: 25)
  -olc,  --overlapLenCutoff   the length of overlapped seqence when joining reads into longer sequences based on the coordinate 
                              on the genome for novel miRNA detection (default: 14)
  -clc,  --clusterLength      the maximum length of the clustered sequences for novel miRNA detection (default: 30)

Optional PATH arguments:
  -pbwt, --bowtie-path        the path to system's directory containing bowtie binary
  -psam, --samtools-path      the path to system's directory containing samtools binary
  -prf,  --RNAfold-path       the path to system's directory containing RNAfold binary

but not with 20 lines

miRge3.0 -h | tail -20
<empty output in my browser shell> 
arunhpatil commented 2 years ago

@schmucr1,

That is great. I guess you are right about the special character (threshold τ) [especially on a browser window]. You may replace that with (threshold T) in the following line in any text editor: Line 104: /scratch/site/u/schmucr1/envs/mirge3/lib/python3.8/site-packages/mirge/libs/parse.py

you could replace the following line as shown below: -kh, --threshold the value for frequency threshold τ (Default kh = 5) to -kh, --threshold the value for frequency threshold T (Default kh = 5)

Let us know if this prints out the whole help message?

Thank you, Arun.

schmucr1 commented 2 years ago

Dear @arunhpatil

Yes, that workaround helped and as a result mirge3.0 help is properly shown, also in the browser shell. I tried to find a way to configure my browser shell so that it can also display special characters such as tau (Unicode Character “τ” (U+03C4) but have not found a solution. If you know something that would be very much appreciated.

In anycase you may close this issue as solved.

Many thanks for your rapid support and best wishes, R.