Closed mult1fractal closed 4 years ago
output from tool:
name len score pvalue ctg13 len=5789 5789 0.9999746084213257 0.0008686456681018204 ctg11 len=6697 6697 0.6740242838859558 0.03518014955812373 ctg12 len=4644 4644 0.9999679923057556 0.0010385980814260896 ctg6 len=10420 10420 0.8116341829299927 0.024303195105370497 ctg14 len=21838 21838 0.9995636940002441 0.0027758894176297304 ctg3 len=56341 56341 0.655927836894989 0.03663418687212025 ctg1 len=77616 77616 0.999999463558197 0.0003399048266485384 ctg8 len=8713 8713 0.9723275899887085 0.010423748017221844 ctg10 len=8641 8641 1.0 0.0 ctg9 len=9716 9716 1.0 0.0 ctg2 len=45788 45788 0.17057377099990845 0.15701714631014427 ctg4 len=18684 18684 0.999731719493866 0.002266032177656923 ctg5 len=11233 11233 0.08206796646118164 0.20190346702923181 ctg7 len=11046 11046 0.9856719970703125 0.008422086260291563
export LC_NUMERIC=en_US.utf-8
###### print contig only ########
mkdir sorted_contig_only
sort -g -k4,4 *.txt | awk '$4>=0.995' | awk '{ print $1 }' | tail -n+2 > sorted_contig_only/sorted_contig_only.txt
Output filterscript: sorted_contig_only.txt
ctg14 ctg4 ctg12 ctg13 ctg1 ctg10 ctg9
Sortiert contigs nach größe [wahrscheinlichster hit unten, weniger wahrscheinlich oben, treshold 95%: nur contigs über qvalue 0.995% werden in neue file geschrieben]
output from tool
Header,Length,phage_score,chromosome_score,plasmid_score,Possible_source ctg1 len=77616,77616,0.999063273086618,4.2748996882179e-05,0.000893977561595013,phage ctg2 len=45788,45788,0.743730796490464,0.0693620524584483,0.186907154209264,phage ctg3 len=56341,56341,0.650433895549219,0.19426980099404,0.155296298583838,phage ctg4 len=18684,18684,0.947390473455516,0.00273918368146794,0.0498703268125154,phage ctg5 len=11233,11233,0.12364929646176,0.51737839232397,0.358972288526878,chromosome ctg6 len=10420,10420,0.162218442538976,0.205961024849863,0.631820543027428,plasmid
Output filterscript: sorted_contig_only.txt ctg1 ctg2 ctg3 ctg4 ctg7 ctg9 ctg10 ctg12 ctg13 ctg14
export LC_NUMERIC=en_US.utf-8
###### print contig only ########
mkdir sorted_contig_only
tail -n+2 *.csv | grep 'phage'| cut -d ' ' -f1 > sorted_contig_only/sorted_contig_only.txt
liefert contig-file von contigs die als Phage markiert wurden
output from tool:
name length score pvalue
1 ctg1 len=77616 77616 0.999054353 0.0001212709 9 ctg9 len=9716 9716 0.999237847 0.0001212709 4 ctg4 len=18684 18684 0.997795520 0.0004850837 10 ctg10 len=8641 8641 0.998109694 0.0004850837 14 ctg14 len=21838 21838 0.946717594 0.0073975261 7 ctg7 len=11046 11046 0.943974291 0.0078826098 12 ctg12 len=4644 4644 0.940076348 0.0087315062 13 ctg13 len=5789 5789 0.875674561 0.0170991996 11 ctg11 len=6697 6697 0.760645336 0.0351685666 2 ctg2 len=45788 45788 0.518028866 0.0835556634 3 ctg3 len=56341 56341 0.371373644 0.1302449673 6 ctg6 len=10420 10420 0.122134031 0.3022071307 8 ctg8 len=8713 8713 0.048813106 0.4559786563 5 ctg5 len=11233 11233 0.001284177 0.9279650740
Output filter script: ctg4 ctg10 ctg1 ctg9
export LC_NUMERIC=en_US.utf-8
###### print contig only ########
mkdir sorted_contig_only
sort -k5,5 *.txt | awk '$5>=0.75' | awk '{ print $2 }' > sorted_contig_only/sorted_contig_only.txt
der Obere Teil des Nextflow Outputs von Virfinder müsste noch gelöscht werden dann liefert das script eine reine Contig file
Welcome to the MARVEL Tool! Please cite: Amgarten DE, Braga LP, Da Silva AM, Setubal JC. MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins. Frontiers in Genetics. 2018;9:304. Arguments are OK. Checked the input folder and found 1 bins. 2019-09-06 11:59:19.650975 Prokka has started, this may take awhile. Be patient. Prokka tasks have finished! 2019-09-06 11:59:23.428158 Starting HMM scan, this may take awhile. Be patient. 2019-09-06 12:00:38.786560 Extracting features from bins... WARNING: HIADMOBI_5 has 1 or zero appropriate CDS features (those are important for prediction). Extracted features from 1 bins 2019-09-06 12:00:38.862441 Doing the machine learning prediction... Found phages in this sample!!! **Bins predicted as phages and probabilities according to Random Forest algorithm: * OX2_draft -> 100.0 % Finished Machine learning predictions! Bins predicted as phages are in the folder: fasta_dir_OX2_draft/results/phage_bins/ 2019-09-06 12:00:39.102674 **Thank you for using Marvel!
Output filter script: OX2_draft
export LC_NUMERIC=en_US.utf-8
###### print contig only ########
mkdir sorted_contig_only
grep '>' *.txt |awk '$4>=75.0' |awk '{print $2 }' > sorted_contig_only/sorted_contig_only.txt
output from tool:
/foobar/Predicted_viral_sequences/VIRSorter_cat-3.fasta
#für minion daten
cd Predicted_viral_sequences
#category 1
mkdir sorted_contig_only_cat_1
cat *1.fasta | grep '>' | cut -d '_' -f2 > sorted_contig_only_cat_1/sorted_contig_only.txt
mkdir sorted_contig_only_cat2 cat *2.fasta | grep '>' | cut -d '' -f2 > sorted_contig_only_cat_2/sorted_contig_only.txt
mkdir sorted_contig_only_cat3 cat *3.fasta | grep '>' | cut -d '' -f2 > sorted_contig_only_cat_3/sorted_contig_only.txt
* output:
ctg1
#### Problem:
* der Name von der CRC-tesfile ist anders aufgebaut... daher ein leicht abgewandelter cut Befehl
* Struktur CRC-testfile: >VIRSorter_k99_2780941_flag=1_multi=1_0000_len=4161-cat_3
```Shell-Script
#für CRC
cd Predicted_viral_sequences
#category 1
mkdir sorted_contig_only_cat_1
cat *1.fasta | grep '>' | cut -d '_' -f2,3 > sorted_contig_only_cat_1/sorted_contig_only.txt
#category 2
mkdir sorted_contig_only_cat_2
cat *2.fasta | grep '>' | cut -d '_' -f2,3 > sorted_contig_only_cat_2/sorted_contig_only.txt
#category 3
mkdir sorted_contig_only_cat_3
cat *3.fasta | grep '>' | cut -d '_' -f2,3 > sorted_contig_only_cat_3/sorted_contig_only.txt
@Stormrider935 status? this can be closed or? please ref the commits
Metaphinder
Output from tool:
output filterscript : sorted_contig_only.txt
ctg7 ctg11 ctg10 ctg14 ctg1 ctg9 ctg4
printed contigs classified as 'phage' in neue file