Closed bytewife closed 1 year ago
Okay the output for both seqlet-bed
and seqlet-fasta
should be correct this time.
I've matched the generated FASTA with bedtools getfasta
as follows:
$ bedtools getfasta -fi examples/ENCSR000EGM/data/hg38.fa -bed modisco_results.bed -fo test.fa.out
which outputs:
$ head test.fa.out
>chr8:106015452-106015481
ttcaagaatattaattagaatacaaatat
>chr8:28986534-28986563
AATTTGAAGGCTATCACCTATCTACAGAA
>chr8:46594384-46594413
AAAAACAAATAAACACATGAAAAACCTCt
>chr8:15668501-15668530
actagcacgtgagccctgcccacagggac
>chr8:33173027-33173056
TGGAAAGTTCTAACCCTTCCCATCATTCC
which aligns with the generated seqlet-fasta
output:
$ modisco seqlet-fasta -i samples/set/spi1_modisco_results.h5 -o modisco_results.fasta -s samples/set/spi1.ohe.npz -p samples/set/peaks.bed --windowsize 2114 -c chr8
$ head modisco_results.fasta
>chr8:106015452-106015481 dir=- pattern_0.0
TTCAAGAATATTAATTAGAATACAAATAT
>chr8:28986534-28986563 dir=- pattern_0.1
AATTTGAAGGCTATCACCTATCTACAGAA
>chr8:46594384-46594413 dir=- pattern_0.2
AAAAACAAATAAACACATGAAAAACCTCT
>chr8:15668501-15668530 dir=- pattern_0.3
ACTAGCACGTGAGCCCTGCCCACAGGGAC
>chr8:33173027-33173056 dir=- pattern_0.4
TGGAAAGTTCTAACCCTTCCCATCATTCC
This matches with the file generated by seqlet-bed
:
$ modisco seqlet-bed -i samples/set/spi1_modisco_results.h5 -o modisco_results.bed -p samples/set/peaks.bed --windowsize 2114 -c chr8
$ head modisco_results.bed
track name="pattern_0" description="TF-MoDISco pattern 'pattern_0' on the positive strand."
chr8 106015452 106015481 pattern_0.0 1000 -
chr8 28986534 28986563 pattern_0.1 1000 -
chr8 46594384 46594413 pattern_0.2 1000 -
chr8 15668501 15668530 pattern_0.3 753 -
chr8 33173027 33173056 pattern_0.4 1000 -
chr8 54507605 54507634 pattern_0.5 1000 -
chr8 128173422 128173451 pattern_0.6 1000 -
chr8 62795869 62795898 pattern_0.7 1000 -
chr8 62761840 62761869 pattern_0.8 1000 -
seqlet-bed
seqlet-fasta
. Currently working on formattingSee below for examples