Wrote a new config for geneformer to accommodate for both versions, including different models (number of layers etc) for each version. The exact files to download for each configuration is handled under the hood. The user only has to specify model name from amongst the available ones (gf-12L-30M-i2048, gf-12L-95m-i4096 etc)
Have also adapted existing unit tests and wrote additional ones for the newly added functions.
Integrated the new tokenizer and model code for the newly published version of Geneformer (https://www.biorxiv.org/content/10.1101/2024.08.16.608180v1.full.pdf), also including minor changes to the original code for better compatibility with the package.
Wrote a new config for geneformer to accommodate for both versions, including different models (number of layers etc) for each version. The exact files to download for each configuration is handled under the hood. The user only has to specify model name from amongst the available ones (gf-12L-30M-i2048, gf-12L-95m-i4096 etc)
Have also adapted existing unit tests and wrote additional ones for the newly added functions.