nataled / PRO

0 stars 0 forks source link

Fix wormbase problem #24

Open nataled opened 6 years ago

nataled commented 6 years ago

For some reason WormBase DR lines differ from all other databases, with extra information for isoforms

DR WormBase; F32A5.4a; CE01274; WBGene00017970; -. [Q19948-1] DR WormBase; F32A5.4b; CE32640; WBGene00017970; -. [Q19948-2] DR WormBase; T11G6.1a; CE47289; WBGene00002001; hars-1. [P34183-2] DR WormBase; T11G6.1b; CE33829; WBGene00002001; hars-1. [P34183-1]

Two issues: 1) when gene name is '-' use the ORFname 2) process only up to period

nataled commented 6 years ago

Might only need to split the name based on period followed by open square bracket. This will then work for any other databases that use this syntax.