Open nataled opened 6 years ago
For some reason WormBase DR lines differ from all other databases, with extra information for isoforms
DR WormBase; F32A5.4a; CE01274; WBGene00017970; -. [Q19948-1] DR WormBase; F32A5.4b; CE32640; WBGene00017970; -. [Q19948-2] DR WormBase; T11G6.1a; CE47289; WBGene00002001; hars-1. [P34183-2] DR WormBase; T11G6.1b; CE33829; WBGene00002001; hars-1. [P34183-1]
Two issues: 1) when gene name is '-' use the ORFname 2) process only up to period
Might only need to split the name based on period followed by open square bracket. This will then work for any other databases that use this syntax.
For some reason WormBase DR lines differ from all other databases, with extra information for isoforms
DR WormBase; F32A5.4a; CE01274; WBGene00017970; -. [Q19948-1] DR WormBase; F32A5.4b; CE32640; WBGene00017970; -. [Q19948-2] DR WormBase; T11G6.1a; CE47289; WBGene00002001; hars-1. [P34183-2] DR WormBase; T11G6.1b; CE33829; WBGene00002001; hars-1. [P34183-1]
Two issues: 1) when gene name is '-' use the ORFname 2) process only up to period