laurabiggins / biases

analysis of biases
0 stars 1 forks source link

in silico datasets #2

Open laurabiggins opened 7 years ago

laurabiggins commented 7 years ago

generate biased in silico datasets of 200 genes

laurabiggins commented 7 years ago

We've got a script that filters a gene info file by selected criteria (e.g. GC content, length etc) but we want to generate many sets of genes. We'll use windows for the criteria and randomly select within these. We don't want chunks of genes that are exactly the same length, GC content etc as then we'll probably get clusters of almost identical genes which will then skew the functional analysis even further.

laurabiggins commented 7 years ago

I've added in options to the filter_gene_info.pl script in /processing/in_silico/ so that the output file can include a specified number of genes. Filtering on the categories is performed first, then a specified number of genes are randomly selected.