wososa / PSI-Sigma

PSI-Sigma
Other
35 stars 10 forks source link

Input files not found in Docker #34

Closed CherWeiYuan closed 2 years ago

CherWeiYuan commented 2 years ago

Hello Woody

May I ask where should my afolder or afolder files be stored when running PSI-Sigma using Docker?

Below are the codes I ran, which gave me the error "Aborting.. Can't open groupa.txt : No such file or directory"

# STAR mapping
cd ${dir}/fastq_trimmed
for prefix in $(ls *fastq.gz | rev | cut -c 17- | rev | uniq)
do 
echo "Working on ${prefix}"
docker run --rm -v $dir:/my_data rssbred/rnacocktail:0.3.2 \
STAR \
--runThreadN ${threads} \
--genomeDir /my_data/star_index \
--outFileNamePrefix /my_data/STAR/${prefix}. \
--readFilesCommand gunzip -c \
--outSAMtype BAM SortedByCoordinate \
--outFilterIntronMotifs RemoveNoncanonical \
--twopassMode Basic \
--readFilesIn /my_data/fastq_trimmed/${prefix}_R1_001.fastq.gz \
/my_data/fastq_trimmed/${prefix}_R2_001.fastq.gz 
done

cd ${dir}/fastq_trimmed
for prefix in $(ls *fastq.gz | rev | cut -c 17- | rev | uniq)
do 
echo "Working on ${prefix}"
cd ${dir}/STAR
docker run --rm -v ${dir}:/my_data rssbred/rnacocktail:0.3.2 \
samtools index /my_data/STAR/${prefix}.Aligned.sortedByCoord.out.bam
done

## PSI-Sigma

# Create links to bam and sj file
mkdir -p ${dir}/afolder
cd ${dir}/afolder
ln -s ${dir}/STAR/*.bam* .
ln -s ${dir}/STAR/*.SJ.* .

# Create treatment status text file
echo treated_*.Aligned.sortedByCoord.out.bam > groupa.txt
echo control_*.Aligned.sortedByCoord.out.bam > groupb.txt

# Run PSI-Sigma
docker run --detach --name psi_container --rm \
-v $dir:/my_data -it docker.io/woodydon/psi_sigma_pipeline:3.6 

docker exec -w /my_data psi_container \
perl /usr/local/bin/PSI-Sigma-1.9r/dummyai.pl \
--gtf /my_data/gtf/gencode.v38.annotation.sorted.gtf \
--name PSIsigma \
--type 1 \
-nread 10

The full output:

gtf = /my_data/gtf/gencode.v38.annotation.sorted.gtf name = PSIsigma type = 1 nread = 10 skipratio = 0.05 fmode = 0 irmode = 0 adjp = 0 trimp = 5 denominator = 0 irrange = 5 variance assumption = equal (Student's t-test) Path = /usr/local/bin/PSI-Sigma-1.9r Aborting.. Can't open groupa.txt : No such file or directory

I also tried downloading the source code, shifting the afolder inside it, and executing it in the Docker container but the same error occurred:

docker exec psi_container \
perl /my_data/PSI-Sigma-1.9r/dummyai.pl \
--gtf /my_data/gtf/gencode.v38.annotation.sorted.gtf \
--name PSIsigma \
--type 1 \
-nread 10

Best Regards WY

wososa commented 2 years ago

Hi @CherWeiYuan ,

Thanks for trying PSI-Sigma. Sorry that you encounter this usability issue. The groupa.txt and groupb.txt files should be in the afolder folder where you execute the PSI-Sigma command. By the way, you may want to use .gtf from Ensembl instead of GENCODE (https://useast.ensembl.org/info/data/ftp/index.html).

Let me know if the problem continues.

Best, Woody

CherWeiYuan commented 2 years ago

Hi @wososa

Thank you for your pointers. I managed to run PSI-Sigma using only Docker implementations.

The docker working directory was set to afolder, and I used Ensembl's genome assembly (Homo_sapiens.GRCh38.dna.primary_assembly.fa) and gtf (Homo_sapiens.GRCh38.105.gtf) as per your guidance.

I did not manage to use symlinks with Docker; instead I copied the PSI-Sigma input files into the afolder.

For people out there who wants to use Docker, I provide the relevant parts of my bash script here:

# Unzip all fastq files
cd ${dir}/fastq_trimmed
gunzip *.fastq.gz 

# Run STAR
cd ${dir}/fastq_trimmed
for prefix in $(ls *fastq | rev | cut -c 14- | rev | uniq)
do 
echo "Working on ${prefix}"
docker run --rm -v $dir:/my_data rssbred/rnacocktail:0.3.2 \
STAR \
--readFilesIn /my_data/fastq_trimmed/${prefix}_R1_001.fastq \
/my_data/fastq_trimmed/${prefix}_R2_001.fastq \
--runThreadN ${threads} \
--genomeDir /my_data/star_index \
--outFileNamePrefix /my_data/star_mapped/${prefix}. \
--outSAMtype BAM SortedByCoordinate \
--outFilterIntronMotifs RemoveNoncanonical \
--twopassMode Basic 

docker run --rm -v ${dir}:/my_data rssbred/rnacocktail:0.3.2 \
samtools index /my_data/star_mapped/${prefix}.Aligned.sortedByCoord.out.bam
done

## PSI-Sigma
# Create afolder directory in which PSI-Sigma runs
mkdir -p ${dir}/afolder

# Copy gtf file to afolder
cp ${dir}/gtf/Homo_sapiens.GRCh38.105.gtf ${dir}/afolder

# Sort gtf
(grep "^#" Homo_sapiens.GRCh38.105.gtf; grep -v "^#" Homo_sapiens.GRCh38.105.gtf | sort -k1,1 -k4,4n) > Homo_sapiens.GRCh38.105.sorted.gtf

# Copy input files to afolder
cp ${dir}/star_mapped/*.bam* ${dir}/afolder
cp ${dir}/star_mapped/*.SJ.* ${dir}/afolder

# Create treatment status text file
echo treated_*.Aligned.sortedByCoord.out.bam >> groupa.txt
echo control_*.Aligned.sortedByCoord.out.bam >> groupb.txt

# Run PSI-Sigma
docker run --rm -v ${dir}:/my_data -w /my_data/afolder \
docker.io/woodydon/psi_sigma_pipeline:3.6 \
perl /usr/local/bin/PSI-Sigma-1.9r/dummyai.pl \
--gtf Homo_sapiens.GRCh38.105.sorted.gtf \
--name PSIsigma \
--type 1 \
-nread 10

Best Regards Wei Yuan

wososa commented 2 years ago

Hi @CherWeiYuan ,

Thanks for posting your script! Very helpful~!

By the way, the Homo_sapiens.GRCh38.105.gtf needs to be sorted. Could you please add the sorted command?

Woody

CherWeiYuan commented 2 years ago

Hi @wososa

Thank you for the feedback! I have edited the script. There were no errors and output was produced successfully without the sorted gtf. Does the sorting of gtf affect the speed of the program or will it affect the results?

Best Regards Wei Yuan

wososa commented 2 years ago

Hi @CherWeiYuan ,

Yes, sorting is needed and will affect some of the results (for genes on the negative strand).

Best, Woody

CherWeiYuan commented 2 years ago

Hi @wososa

I re-ran the analysis and the new output files have drastically different sizes. Thank you for your help!

Best Regards Wei Yuan