kacst-bioinfo-lab / labwork

3 stars 1 forks source link

Generate distribution of conserved noncoding elements and noncoding spacers of Drosophila genomes #6

Closed MManee closed 6 years ago

MManee commented 7 years ago
MManee commented 7 years ago

for newbies in GitHub, this thread #1 , should be useful!

amer1404 commented 7 years ago

For coding steps :

bedtools intersect -a reads.bed -b genes.bed > coding.bed For non coding steps:

bedtools complement -i coding.bed -g genes_non.bed > non-coding.bed

MManee commented 7 years ago
amer1404 commented 7 years ago

To make sure the results of previous files, I've calculate of coding lengths and noncoding lengths and compared with chromosome lengths. Results of calculate were not equal.

to solve this problem, We have to try the following steps:

1- find coding and non coding data from orginal files using bedtools.

coding_subtract.txt non_coding_subtract.txt bedtools intersect -a reads.bed -b genes.bed > coding.bed 2- find non coding data using bedtools. bedtools complement -i coding.bed -g genes_non.bed > non-coding.bed 3- we consider solution in previous issues that solve complement error.

4- find the coding data agin form noncoding data by run complement command 5- run subtract command on both files Rresult of calculation :

total non coding length :139034833 total coding length :29701678 total coding +non coding :168736511

gene length :168736537

MManee commented 7 years ago
MManee commented 7 years ago
amer1404 commented 7 years ago

please find the intronic data and perl code

code+data.tar.gz

amer1404 commented 7 years ago

last update data.tar.gz

amer1404 commented 7 years ago

this is final steps and result of Perl code : 1- downloded chromosom ( genom) ==> sorted_genome.txt 2- sort the allgenomefile.txt into sorted_genome.txt sort -k 1,1 -k2,2n allgenomefile.txt > sorted_genome.txt 3- sort the dm3_exons.txt to sorted_dm3_exons.bed sort -k 1,1 -k2,2n dm3_exons.txt > sorted_dm3_exons.bed 4- creat new genome file with valid entries by perl 5- creat the noncoding file bedtools complement -i sorted_dm3_exons.bed -g sorted_genome > noncoding.bed 6- creat the coding file by copying sorted_dm3_exons.bed to coding.bed 7- added 0 as start last genom file by perl code 8- bedtools subtract -a sorted_genome.bed -b noncoding.bed > coding.bed 9- create intronic file using cooding file without +1 -1 10- system("bedtools subtract -a ../data/noncoding.bed -b ../data/intronic.bed > ../data/intergenic.bed"); 11- calculate lengh of intronic + intergenic = noncoding lenght 12- calculate lengh of noncoding + codind = total lenght of genome

Results :

total lenght of coding                   : 29701704
total lenght of noncoding                : 139034833
 total of lenght of noncoding and coding : 168736537 
total lenght of genome                   : 168736537

total lenght of intronic                : 109319658
 total lenght of intergenic             : 29715175
 total lenght of intergenic and intronic: 139034833
 total lenght of noncoding              : 139034833

labwork.tar.gz

MManee commented 7 years ago

This is so awesome! Well done, Amer. The next step is to generate conserved noncoding elements (CNEs) and noncoding spacers of intronic and intergenic regions.

atalgarni commented 7 years ago

Alrighty, I just started ... so far I have just edited the chromInfo file. The Code can be found at https://github.com/kacst-bioinfo-lab/labwork/tree/master/code/abdulmalek/drosphila.py To make things easier and reproducible I left the input to the user as seen in lines 7-8,,however I am still stuck at how to automate this with os.system at lines 30-31 For the algorithm: 1- Add new column with 0 input for all rows inside chromInfo.txt, and convert the file into bed format. 2- Sort the chromInfo.bed using system sort | uniq functions. 3- Sort the dm3_exons.txt using sort | uniq functions.

For the output file please check https://github.com/kacst-bioinfo-lab/labwork/tree/master/doc/abdulmalek/chromInfo.bed https://github.com/kacst-bioinfo-lab/labwork/tree/master/doc/abdulmalek/chromSort.bed https://github.com/kacst-bioinfo-lab/labwork/tree/master/doc/abdulmalek/dm3_exonsSort.bed

@MManee @amer1404 @SULTAN-ALHARBI

Cheers,

Maali055 commented 7 years ago

Print "Hello Group! \n";

I would like to thank Dr. Manee for involving me in the group. I Hope to be a positive addition to the group and absolutely I will learn alot from you.

MManee commented 7 years ago
amer1404 commented 7 years ago

Hello Everyone

please find below the final result, Tomorrow I will upload all related files.

Total lenght of coding                         : 29701704
Total lenght of noncoding                      : 139034833
Total of lenght of noncoding and coding        : 168736537 
Total lenght of genome                         : 168736537

Total lenght of intronic                       : 109319658
Total lenght of intergenic                     : 29715175
Total lenght of intergenic and intronic        : 139034833
Total lenght of noncoding                      : 139034833

*************************************************************************************
Total lenght of coding CEs                    : 18219044
Total lenght of coding spacer                  : 11482662
Total lenght of CEs and spacer coding         : 29701706
Total lenght of coding                         : 29701704

*************************************************************************************
Total lenght of intergenic CNEs                : 2744358
Total lenght of intergenic spacer              : 26970817
Total lenght of CNEs and spacer coding         : 29715175
Total lenght of intergenic                     : 29715175

*************************************************************************************
Total lenght of intronic CNEs                  : 31527418
Total lenght of intronic spacer                : 77792240
Total lenght of CNEs and spacer coding         : 109319658
Total lenght of intronic                       : 109319658

*************************************************************************************`
MManee commented 7 years ago

It looks perfect. We will need to check the datasets using genome browser. Well done Amer.

amer1404 commented 7 years ago

Hello dear All, :)

Kindly please find the updated files in my folders( data and code).