iangow / filings

Code to manage data related to SEC filings on EDGAR.
19 stars 9 forks source link

Regex - up to date #6

Open blucap opened 5 years ago

blucap commented 5 years ago

Hi Ian, I am using bits of your code for downloading cusips and CIKs I updated the regex expression to: /CUSIP(?:\s+(?:No\.|#|Number):?)?\s+([0-9A-Z]{1,3}[\s-]?[0-9A-Z]{3}[\s-]?[0-9A-Z]{2}[\s-]?\d{1})/ As in some cases there is only spaces between 'CUSIP' and the number, see

https://www.sec.gov/Archives/edgar/data/3952/0000906344-03-000065.txt

which contains only CUSIP 19483106.

However, in some cases there is a colon : and some space between CUSIP and the number: CUSIP: 025537101

Did you manage to deal with that?

Cheers, Martien

iangow commented 5 years ago

@blucap I am in the process of moving this code over to a new repository. The idea is to have a cleaner repository that is better for open-source purposes (e.g., documentation and better structure). I am not sure whether this code has made the transition over.

If the code weren't in such a sorry state, I'd suggest editing and submitting a pull request. But let me see where we are on this particular code first.

blucap commented 5 years ago

Hi Ian,

I coded it very differently, Happy to to send by email.

Martien

iangow commented 4 years ago

@blucap We have done quite a bit of work on this in the edgar repository (see here), but the person helping with this made it very complicated and I never got a chance to dial in your tweaks; maybe I will revisit those once we get things under control there.