Open laurabiggins opened 7 years ago
We've got a script that filters a gene info file by selected criteria (e.g. GC content, length etc) but we want to generate many sets of genes. We'll use windows for the criteria and randomly select within these. We don't want chunks of genes that are exactly the same length, GC content etc as then we'll probably get clusters of almost identical genes which will then skew the functional analysis even further.
I've added in options to the filter_gene_info.pl script in /processing/in_silico/ so that the output file can include a specified number of genes. Filtering on the categories is performed first, then a specified number of genes are randomly selected.
generate biased in silico datasets of 200 genes