hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

AUS-229: CUPPA cohort level feature extraction #529

Closed luan-n-nguyen closed 3 months ago

luan-n-nguyen commented 4 months ago

Hey Charles,

I've now implemented the cohort level feature extraction.

Per CategoryType, DataItems are extracted per sample and inserted into DataItemMatrix (n_features x n_samples; a ConcurrentHashMap) using [DataSource,ItemType,Key] attributes from DataItem as the keys. Once all samples are processed, DataItemMatrix is written to a file.

Multithreading is based on TaskExecutor. I copied the code in SampleTask and CupAnalyzer.