rishavray / PILFER

piRNA cluster prediction and analysis framework
8 stars 0 forks source link

I cannot locate ncrna.bed, retro-transposons.bed anywhere in the zip file, i assume the script you have provided is according to your folders #1

Open frimz opened 5 years ago

frimz commented 5 years ago

Adapter trimming Running Collapse Reading from file ./trimmed/HI.3965.005.RPI1.1-1_R1.fastq.gz Processing reads Original number of reads: 16651798 collapsed number of reads: 732060 Mapping to hg19 piRNA alignment Filtering SAM [samopen] SAM header is present: 26719 sequences. HI.3965.005.RPI1.1-1_R1 [samopen] SAM header is present: 26719 sequences. [samopen] SAM header is present: 26719 sequences. rm: cannot remove './hg19_sam/HI.3965.005.RPI1.1-1_R1.sam': No such file or directory Making BED files Error: Unable to open file /data/sata_data/home/tmhrnaseq/pirna/ncrna/ncrna.bed. Exiting. Traceback (most recent call last): File "./tools/pilfer.py", line 2, in import numpy as np ImportError: No module named numpy python: can't open file 'union.py': [Errno 2] No such file or directory Merging clusters Traceback (most recent call last): File "./tools/merge_cluster.py", line 16, in next(csvin) StopIteration TE profiling HI.3965.005.RPI1.1-1_R1 Error: Unable to open file /data/sata_data/home/tmhrnaseq/pirna/retro-transposons.bed. Exiting. Traceback (most recent call last): File "./tools/merge.py", line 7, in row1 = csvin.next() StopIteration Traceback (most recent call last): File "./tools/merge.py", line 7, in row1 = csvin.next() StopIteration Traceback (most recent call last): File "./tools/merge.py", line 7, in row1 = csvin.next() StopIteration Traceback (most recent call last): File "./tools/merge.py", line 7, in row1 = csvin.next() StopIteration

rishavray commented 5 years ago

I'm guessing that you are trying to run the script from the Supplementary files of the paper. However, the script provided here is updated and should work without the directory structure error. However, you might have to update a few things in the script, such as your root directories and number of samples. The gzip file containing the human-dataset contains the file gencode28-human-ncRNA.bed whose path you need to add in the script. Also you need to extract the transposon zipped file in the tools directory and add the path in the script. I'm afraid that this is not a drop in kind of software, and needs a bit of set up. As you can see that it's primarily a collection of tools that I wrote, and I did not have time to elegantly put them together, so it needs a little work to get it going. I have annotated the script in this repository, and I would recommend using his one, rather than the Supplementary zip. Also, you might need to update the data if you want to use another genome assembly other than hg19.

frimz commented 5 years ago

Hey Rishav,

Thank you for your email, Highly appreciated. I will now download the new script, fingers crossed it should work, otherwise, I might ask your assistance again.

Cheers Farha

On Mon, 17 Dec 2018 at 20:55, Rishav Ray notifications@github.com wrote:

I'm guessing that you are trying to run the script from the Supplementary files of the paper. However, the script provided here is updated and should work without the directory structure error. However, you might have to update a few things in the script, such as your root directories and number of samples. The gzip file containing the human-dataset contains the file gencode28-human-ncRNA.bed whose path you need to add in the script. Also you need to extract the transposon zipped file in the tools directory and add the path in the script. I'm afraid that this is not a drop in kind of software, and needs a bit of set up. As you can see that it's primarily a collection of tools that I wrote, and I did not have time to elegantly put them together, so it needs a little work to get it going. I have annotated the script in this repository, and I would recommend using his one, rather than the Supplementary zip. Also, you might need to update the data if you want to use another genome assembly other than hg19.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rishavray/PILFER/issues/1#issuecomment-447753348, or mute the thread https://github.com/notifications/unsubscribe-auth/Arts_PhyumS13DgzOYSM4oehb7gy2WPpks5u504FgaJpZM4ZVsEV .

-- Farha Ramzan Liggins Institute,

The University of Auckland,

85 Park Road, Grafton,

Private Bag 92019

Auckland 1023

New Zealand

Mobile: 022-4505345

Email: f.ramzan@auckland.ac.nz

frimz commented 5 years ago

Hy Rishav,

It seems the script is now running smoothly, however, i am bit confused about the transposons, you have mentioned to use fasta file from previous TE_profiling

Similarly we created a bowtie 2 index of the transposons we used in the

previous step. You can obtain the fasta using the getfasta module of bedtools and then create an index for it to map bowtie2 -p 8 -k 100 --local -x -f -U $out_dir_prefix"all_pirna.fa" --nofw --no-unal --no-hd -S $out_dir_prefix"known_putative.sam" 2> $out_dir_prefix"logs/TE_target.txt" Could you please brief me about which fasta file should I use to get the index using bowtie2 for bowtie getfasta options you need a bed file and fasta file as input so is retro-transposons. bed the required bed file?

and also in your paper I can see all the nice graphs and figures, will I get them also from this pipeline or I need to use some other programme?

I am very new to this, and I assume it should be a very straightforward thing.

Your help is much appreciated.

Cheers Farha

On Tue, 18 Dec 2018 at 10:26, Farha Ramzan fram028@aucklanduni.ac.nz wrote:

Hey Rishav,

Thank you for your email, Highly appreciated. I will now download the new script, fingers crossed it should work, otherwise, I might ask your assistance again.

Cheers Farha

On Mon, 17 Dec 2018 at 20:55, Rishav Ray notifications@github.com wrote:

I'm guessing that you are trying to run the script from the Supplementary files of the paper. However, the script provided here is updated and should work without the directory structure error. However, you might have to update a few things in the script, such as your root directories and number of samples. The gzip file containing the human-dataset contains the file gencode28-human-ncRNA.bed whose path you need to add in the script. Also you need to extract the transposon zipped file in the tools directory and add the path in the script. I'm afraid that this is not a drop in kind of software, and needs a bit of set up. As you can see that it's primarily a collection of tools that I wrote, and I did not have time to elegantly put them together, so it needs a little work to get it going. I have annotated the script in this repository, and I would recommend using his one, rather than the Supplementary zip. Also, you might need to update the data if you want to use another genome assembly other than hg19.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rishavray/PILFER/issues/1#issuecomment-447753348, or mute the thread https://github.com/notifications/unsubscribe-auth/Arts_PhyumS13DgzOYSM4oehb7gy2WPpks5u504FgaJpZM4ZVsEV .

-- Farha Ramzan Liggins Institute,

The University of Auckland,

85 Park Road, Grafton,

Private Bag 92019

Auckland 1023

New Zealand

Mobile: 022-4505345

Email: f.ramzan@auckland.ac.nz

-- Farha Ramzan Liggins Institute,

The University of Auckland,

85 Park Road, Grafton,

Private Bag 92019

Auckland 1023

New Zealand

Mobile: 022-4505345

Email: f.ramzan@auckland.ac.nz

rishavray commented 5 years ago

Hi Farha,

The fasta file in this case would be the fasta sequence of hg19 version of human genome and yes the bed file is the retro-transposon bed. This would extract the transposon sequences from the genome and align the piRNAs against them. I'm afraid this script would not generate those graphs and figures as in the paper. This would however generate the necessary data to make those figures. Those were painstakingly generated using R and Circos. My intention was to make this program modular to suit the different analytical questions, and not be a black box. I'm afraid that you'll have to come up with your visualization scheme to visualize the results. Thanks for the feedback, let me know if I can update anything in the repo.

frimz commented 5 years ago

Dear Rishavray/Pilfer

Thank you for all the work throughout. The pipeline seems working now and I can get the Cluster output files. However, I was wondering how can I get the output of known piRNAs, as you have mentioned in your methods section that you did perform get the results for known and putative piRNAs. Could you please help me with that? Should I be using a different pipeline or this only will give me the answers?

Kind Regards Farha Ramzan

On Wed, 19 Dec 2018 at 22:51, Rishav Ray notifications@github.com wrote:

Hi Farha,

The fasta file in this case would be the fasta sequence of hg19 version of human genome and yes the bed file is the retro-transposon bed. This would extract the transposon sequences from the genome and align the piRNAs against them. I'm afraid this script would not generate those graphs and figures as in the paper. This would however generate the necessary data to make those figures. Those were painstakingly generated using R and Circos. My intention was to make this program modular to suit the different analytical questions, and not be a black box. I'm afraid that you'll have to come up with your visualization scheme to visualize the results. Thanks for the feedback, let me know if I can update anything in the repo.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rishavray/PILFER/issues/1#issuecomment-448535068, or mute the thread https://github.com/notifications/unsubscribe-auth/Arts_IfQo3oQkZO_KLeeax-O1RznqBo6ks5u6gwsgaJpZM4ZVsEV .

-- Farha Ramzan Liggins Institute,

The University of Auckland,

85 Park Road, Grafton,

Private Bag 92019

Auckland 1023

New Zealand

Mobile: 022-4505345

Email: f.ramzan@auckland.ac.nz

rishavray commented 5 years ago

Hi Farha,

You can get the BED files of all the piRNAs in the bed directory that is being created in you working directory. It contains the BED files of all the known and putative piRNAs along with their genomic location and counts. Also, the script should create two fasta files as known.fa, and putative.fa which has the piRNA sequences. Otherwise if you know how to deal with SAM/BAM files using samtools, then you can get all the information in the BAM files in pirna_sam directory. It will give you all the necessary piRNA ids and the reads mapping to those ids. I hope that works for you.

Best, Rishav

vivekruhela commented 5 years ago

I would like to ask a few questions regarding your pipeline:

  1. How do you generate pirna_unique.fa. What are the issues in the original pirna database? Does that contain duplicated sequences?

  2. Why you have chosen the -k parameter as 300 which means to allow having 300 biological replicates. Usually, in small-RNA sequencing pipeline very high number of biological replicates are prevented and limited to 100?