phasegenomics / FALCON-Phase

FALCON-Phase integrates PacBio long-read assemblies with Phase Genomics Hi-C data to create phased, diploid, chromosome-scale scaffolds
Other
74 stars 17 forks source link

Add preprocess script #69

Closed maximilianpress closed 4 years ago

maximilianpress commented 4 years ago

This script has some extra functionality in addition to scrub_names.pl in the case of user error in naming contigs. it also works with a combined p+h fasta.

bnelsj commented 4 years ago

The script looks fine to me. Adding some tests at some point would be a good idea. Is there any reason to use the perl script over this script at this point? If not, is there documentation that should be updated?

maximilianpress commented 4 years ago

@bnelsj I don't see that scrub_names.pl should still be used, for any other reason than general conservatism. I have used this script successfully for FP in the past, though that was a while ago. I am sure that there are docs to update. The README at least mentions this. I suppose Hayley's ops guide is another thing to update.

As for testing, any testing of FP is overdue. I suppose this script is as good a place to start as any. Will try to get around to it in the next day or 2.

So I'll do these things and then merge: 1) add tests for just this script 2) update Ops guide and README

any other thoughts?

maximilianpress commented 4 years ago

Now have tests for the preprocessing script running in CI. Confirmed that FP gcc build is unaffected. Updated README and Hayley's ops doc.