Around 90% of the L2G predictions present an L2G < 0.05. This implies that most of the loaded data has very little value from a gene prioritisation perspective. After adding the L2G features, this data is even heavier, and it will continue growing in the future.
After much consideration with Genetics, Data, BE, and FE representatives, we concluded that filtering L2G predictions by 0.05 would be a reasonable scenario.
We would like to add an optional filter to the L2G predictions step to filter the output data and add 0.05 to the orchestration configuration.
Note: At some point, whether we wanted to save the dataset with the predictions < 0.05 was discussed. Recreating that dataset using the same step and model using different step parameters would be relatively easy. So, it's probably not worth generating it for every data release.
Around 90% of the L2G predictions present an L2G < 0.05. This implies that most of the loaded data has very little value from a gene prioritisation perspective. After adding the L2G features, this data is even heavier, and it will continue growing in the future.
After much consideration with Genetics, Data, BE, and FE representatives, we concluded that filtering L2G predictions by 0.05 would be a reasonable scenario.
We would like to add an optional filter to the L2G predictions step to filter the output data and add 0.05 to the orchestration configuration.
Note: At some point, whether we wanted to save the dataset with the predictions < 0.05 was discussed. Recreating that dataset using the same step and model using different step parameters would be relatively easy. So, it's probably not worth generating it for every data release.