j-andrews7 / VAMPIRE

Variant and Epigenetic anNotation for Underlying Significance and Regulation
MIT License
3 stars 0 forks source link

Allow iterative or parallel processing for ChIP-seq data. #27

Closed j-andrews7 closed 7 years ago

j-andrews7 commented 8 years ago

The current idea is to only utilize one type of ChIP-seq data to analyze "enhancer activity", but it would be useful if multiple data sets could be used in the same way. The challenge would be ensuring unique, yet easily identifiable INFO fields for each to print to output.

Computationally, the most efficient way to do this would be to allow users to provide multiple datasets at the command line for a common argument:

-d dataset1.bed dataset2.bed dataset3.bed or -d dataset1 -d dataset2 or such. Not sure which is more straightforward to implement in python. The backend processing would be the same for each, but would need to dynamically name INFO fields, perhaps based on a set piece of the filename. For example:

-d K27AC.data.bed FAIRE.data.bed would be split by '.' and use the first element as a prefix to the INFO field. K27ACZ: (z-scores for each sample with variant); FAIREZ: (z-scores for each sample with variant).

j-andrews7 commented 7 years ago

I'm inclined to shelve this for the time being, as I don't feel it's as necessary/helpful as I originally thought. Potentially worth coming back to in the future.