lutteropp hakmer-ng-redesign issues

lutteropp / hakmer-ng-redesign

0 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add --earlyStop option that lets the user specify after how much extracted sequence data we want to stop

#76 lutteropp opened 5 years ago
0
Run hakmer-ng on the supermatrices from Antonis paper

#75 lutteropp opened 5 years ago
0
Augment blocks with mismatches a bit later? And maybe only if we didn't find enough seeds/ not enough taxa per seed?

#74 lutteropp opened 5 years ago
0
Put very promising seeds on a fast track/ directly process them

#73 lutteropp opened 5 years ago
1
Fix yet another '$' in supermatrix bug

#72 lutteropp closed 5 years ago
1
Compute harmonic mean of average pairwise genome substitution rates

#71 lutteropp closed 5 years ago
1
With same number of taxa, prefer blocks with lower surrounding subRate

#70 lutteropp closed 5 years ago
1
-r option causes segfault on cluster

#69 lutteropp closed 5 years ago
1
Don't search for approximate matches in taxa that were rejected due to paralogy issues

#68 lutteropp closed 5 years ago
0
Perform an iterative search for approximate matches

#67 lutteropp closed 5 years ago
1
Speed up search for approximate matches

#66 lutteropp closed 5 years ago
1
If we have multiple approximate matches in a taxon, don't add any of them

#65 lutteropp closed 5 years ago
1
Make maximum accepted average substitution rate a dynamically chosen parameter

#64 lutteropp opened 5 years ago
1
Implement trimming of already aligned extended block

#63 lutteropp opened 5 years ago
1
Sample taxa more evenly

#62 lutteropp closed 5 years ago
2
Maybe set the flankSize to the seed size?

#61 lutteropp closed 5 years ago
1
Something is wrong with the supermatrix built for the w252 dataset

#60 lutteropp closed 5 years ago
9
Compute statistics about how much sequence data has been extracted from each taxon

#59 lutteropp closed 5 years ago
0
Vectorize the code (very low priority, but still a fun exercise)

#58 lutteropp opened 5 years ago
0
w2016 dataset throws malloc memory corruption error

#57 lutteropp closed 5 years ago
1
Plot sequence data usage for each minimum seed size if we wouldn't care about overlaps/ reusage of sequence data

#56 lutteropp closed 5 years ago
1
Implement elbow criterion

#55 lutteropp closed 5 years ago
0
Improve block priority scoring

#54 lutteropp closed 5 years ago
2
Postpone adding of extracted blocks with all-equal sites in order to favor more informative blocks?

#53 lutteropp closed 5 years ago
2
Adapt flankwidth to fit the kmer seed size - the larger the seed, the larger the flank size can be

#52 lutteropp closed 5 years ago
0
If total amount of extracted sequence data is too low (e.g., lower than 10%), do a second run with lower kmin

#51 lutteropp closed 5 years ago
0
Always write info file

#50 lutteropp closed 5 years ago
0
Add protein data support

#49 lutteropp opened 5 years ago
0
Improve dealing with paralogs

#48 lutteropp opened 5 years ago
4
Perform better block MSA - maybe with MUSCLE?

#47 lutteropp opened 5 years ago
1
Use average number of substitutions

#46 lutteropp closed 5 years ago
4
Increase k in a better way than one by one, e.g. bei looking at the lcp-array entries...

#45 lutteropp closed 5 years ago
0
Don't run MSA on the seeds, but really use that trimming information

#44 lutteropp closed 5 years ago
1
Still augment seeds with mismatches, but only after all other seeds have been found?

#43 lutteropp closed 5 years ago
1
Fix missing data statistics computation

#42 lutteropp closed 5 years ago
0
A block should only store it's relevant sequence coordinates - do the MSA later on

#41 lutteropp closed 5 years ago
0
Very simple restructuring - if partial extension turns out do be a good idea, then the entire hakmer-ng code can be made MUCH more simple (and probably more efficient, too)...

#40 lutteropp opened 5 years ago
1
Modify site-selection-criteria scripts to deal with split/ non-consecutive partitions

#39 lutteropp opened 5 years ago
0
Improve extension: Add trimmed extension, this is, allow for just a subset of taxa in the block to be extended if some taxa already say stop

#38 lutteropp closed 5 years ago
0
Prune overlapping seeds instead of discarding them

#37 lutteropp closed 5 years ago
0
Store suffix array in a file and check if it's already there

#36 lutteropp closed 5 years ago
0
Make param_ranges input possible in order to nnot recompute the suffix array every time

#35 lutteropp closed 5 years ago
1
Implement variant: Start with large kmin, then gradually reduce the kmin size...

#34 lutteropp closed 5 years ago
0
Implement variant: Only do approximate match augmentation after all blocks have been chosen and extended

#33 lutteropp closed 5 years ago
2
Implement variant: First take all seeds, then do extensions

#32 lutteropp closed 5 years ago
1
Speed up iterative seeded block extraction by only looking for exact counts and skipping uninteresting suffix array regions

#31 lutteropp closed 5 years ago
1
Think about merging otherwise compatible superseeds, e.g., by pruning one of them

#30 lutteropp opened 5 years ago
0
Prefer the block with larger k-mer size if the number of taxa in two blocks is the same

#29 lutteropp closed 5 years ago
0
Store the per-block-MSAs externally in some file, they won't fit into the RAM for large datasets!

#28 lutteropp closed 5 years ago
0
Improve dealing with reverse complements

#27 lutteropp closed 5 years ago
1