Open Emmashipman opened 1 year ago
Heya, @Emmashipman,
Just came across this while looking at some other issues here, are you sure the slurm compute node is connected to the location you are trying to access?
You can check that for sure by just making a new slurm script where the call is just checking for the directory you are trying to use. So like just making the call line something like this:
call="ls /home/user/RNASEQDATADUMP/HTS_outputs/"
Running that slurm script and seeing if it prints the contents of that location or if it gives you an error will at least let you know if slurm itself can see the location. (you could also probably just run srun ls /home/user/RNASEQDATADUMP/HTS_outputs/
to check, but might as well do everything the same in case the script is making a difference)
good luck!
Hi, thanks for the reply! I fixed that, it was a small error in my shell script, but I have found a more serious bug.... when I run a set of the HTS programs on a pair of PE RNASeq data, I get a very strange output and error message.
The output does not produce any trimmed fastq.gz files, and the JSON file with the record of the programs run only includes a few of the programs (e.g., NTrimmer, QWindowtrim, and the final run of hts_Stats) and show hardly any input or output reads (about 2400, instead of several million in the file). It is like this:
"Program_details": { "program": "hts_NTrimmer", "version": "v1.3.3", "options": { "append-stats-file": "/home/enshipma/RNASEQDATADUMP/HTS_outputs/N_h> "exclude": false, "force": false, "notes": "remove N chars", "uncompressed": false } }, "Fragment": { "in": 2319, "out": 2319, "basepairs_in": 694532, "basepairs_out": 689580 }, "Single_end": { "in": 18, "out": 18, "basepairs_in": 4263, "basepairs_out": 4234,
As you can see, the number of reads is really low, and the log has missed my entire first five programs, starting with the NTrimmer.
I also get this error message on some of my runs.
terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 10) > this->size() (which is 9) ERROR: There are not either 3 (SE, itab3), 4 (SE, itab with tags) 5 (PE, itab5), or 6 (PE, itab6), or 8 (PE, itab6 with tags) elements within a tab delimited file line
I found the error message in the github page, saying it had to do with the i/o handler. I assume it's a problem with the tab delimited files that get streamed from program to program.
I just ran a test today trying to only use a few of the hts programs (the SeqScreener), to see if I could isolate the point at which the files failed. I got a json file that looked as expected and no fastq output files.
Here is my code: hts_Stats -L /home/enshipma/RNASEQDATADUMP/HTS_outputs/B_htsStats.json -N 'initi al stats' -1 /home/enshipma/RNASEQDATADUMP/BWTsalt3/BWTsalt3_CKDL220018431-1A_HJ 2W5BBXX_L6_1.fq.gz -2 /home/enshipma/RNASEQDATADUMP/BWTsalt3/BWTsalt3_CKDL220018 431-1A_HJ2W5BBXX_L6_1.fq.gz | hts_SeqScreener -A /home/enshipma/RNASEQDATADUMP/H TS_outputs/B_htsStats.json -N 'screen phix' | hts_SeqScreener -A /home/enshipma/ RNASEQDATADUMP/HTS_outputs/B_htsStats.json -N 'count rRNAs' --record --seq /home/enshipma/RNASEQDATADUMP/wheat_rrna.fasta | hts_Stats -A /home/enshipma/RNASEQDATADUMP/HTS_outputs/B_htsStats.json -N 'final stats' \ -f /home/enshipma/RNASEQDATADUMP/HTS_outputs/B_hts_test1/
The slurmout file seemed to have read out all 55 million lines to the stdout, and the error file is blank. I have no output files from this in fastq format. My next step will be to include the next program following Seqscreener, and see if the stream fails there...
I am just wondering if you have encountered this issue before and if you know what step can end up being overzealous in trimming?
Hello,
I am trying to use HTStream. I have based my code off the tutorial here: https://ucdavis-bioinformatics-training.github.io/2020-mRNA_Seq_Workshop/data_reduction/01-preproc_htstream_mm
and I referred to the sbatch example script heavily:
!/bin/bash
SBATCH --job-name=htstream_N
SBATCH --nodes=1
SBATCH --ntasks=10
SBATCH --time=12:00:00
SBATCH --mem=32000
SBATCH --output=slurmout/htstreamN%A_%a.out # File to which STDOUT will be written
SBATCH --error=slurmout/htstreamN%A_%a.err # File to which STDERR
SBATCH --mail-user=
SBATCH --mail-type=ALL
start=
date +%s
echo $HOSTNAME echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
module load deprecated/HTStream/1.3.3
call=" hts_Stats -L /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "initial stats" \
-1 /home/user/RNASEQDATADUMP/NWTsalt8/NWTsalt8_CKDL220024596-1A_HJ2W5BBXX_L6_1.fq.gz \ -2 /home/user/RNASEQDATADUMP/NWTsalt8/NWTsalt8_CKDL220024596-1A_HJ2W5BBXX_L6_1.fq.gz | \
hts_SeqScreener -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "screen phix" | \ hts_SeqScreener -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "count rRNAs" \ --record --seq /home/user/RNASEQDATADUMP/wheat_rrna.fasta | \ hts_Overlapper -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "overlap PE reads" | \ hts_AdapterTrimmer -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "trim adapters" | \ hts_PolyATTrim -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "trim polyAT" | \ hts_NTrimmer -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "remove N chars" | \ hts_QWindowTrim -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "quality trim ends" | \ hts_LengthFilter -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "remove reads < 50 bp" \ -n -m 50 | \ hts_Stats -A /home/user/RNASEQDATADUMP/HTS_outputs/N_htsStats.json -N "final stats" \ -f /home/user/RNASEQDATADUMP/HTS_outputs/N_hts_paired.htstream "
echo $call eval $call
end=
date +%s
runtime=$((end-start)) echo $runtime