nodrogluap / OpenDBA

GPU-accelerated Dynamic Time Warp (DTW) Barycenter Averaging
Other
64 stars 13 forks source link

Sequence alignment to a target #8

Closed epi-gene closed 3 years ago

epi-gene commented 4 years ago

Hi, I have nanopore data where most of the fast5 reads are non-targets (>95%). Would it be possible to identify the reads that match my target sequence using openDBA ? My target sequences are being flattened out of existence in the final .avg file. Thanks !

nodrogluap commented 4 years ago

Hi, If you can identify the cluster in the pairwise distance matrix (e.g. in R with cuTree), it would be possible to just submit those data to OpenDBA for consensus, but I agree that it would be useful to have it automatically generate consensus from clusters within a certain height cutoff using Ward's D2 maybe? I've changed this to a feature request, which essentially turns OpenDBA into a de novo signal assembly tool for raw data :-)

epi-gene commented 4 years ago

That’ll be great ! If openDBA can perform assembly and target alignment it would be really helpful for nanopore data sets. Looking forward for the updates.

nodrogluap commented 3 years ago

OpenDBA now has initial support for averaging of clusters found within the dataset (calculated using complete linkage). An extra argument has been added to the command line as per the README, where 1 means "put everything in one cluster", and 0 means every sequence is in its own cluster (not exceptionally useful perhaps), and in between values control the coarseness of the clustering. In datasets with high diversity (e.g. 95% non-target as per the original issue), for efficiency one may wish to remove the known off-target sequences from the fast5 files before performing clustering. This can be done using the nanostripper software.

For example to remove human reads from a viral dataset: nanostripper /dev/null hg19.fasta my_dir_containing_fast5_files

epi-gene commented 3 years ago

I tried installing the latest version of OpenDBA and having trouble installing.

make nvcc -DDEBUG=0 -DDOUBLE_UNSUPPORTED=0 -DHDF5_SUPPORTED=0 --expt-relaxed-constexpr -rdc=true -maxrregcount 26 --std=c++11 -arch=sm_61 -c openDBA.cu -o openDBA.o In file included from /usr/local/cuda/bin/../targets/x86_64-linux/include/cuda_runtime.h:83, from <command-line>: /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported! 138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported! | ^~~~~ make: *** [Makefile:36: openDBA.o] Error 1

Any suggestions ?

nodrogluap commented 3 years ago

This is due to having a gcc version (likely 9+) in your path that is incompatible with the version of CUDA you have installed on the same system. Please check out the following compatibility chart to decide which version compiler downgrade or CUDA upgrade to try: https://gist.github.com/ax3l/9489132