mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Understand how TFN tables map to filings #31

Closed iangow closed 5 years ago

iangow commented 5 years ago

Data in tfn.table1 and tfn.table2.

iangow commented 5 years ago

Addressing #30 will help here.

bdcallen commented 5 years ago

@iangow The Thomson Reuters Insiders Data Feed Manual has a pretty good summary of tables 1 and 2. Table 1 contains information on transactions of non-derivative stock, and Table 2 contains information on derivative stock. Having had a look at some of the xml files, it seems many of the form 4's have either a non-trivial <DerivativeTable> (eg. this), a non-trivial <nonDerivativeTable> (eg. this), or both (eg. this). In fact, in the latter example, there is a table of one of these types for each transaction reported under the filing (see the html for example).

Also, there seem to be some numbers which Thomson Reuters have scraped which map to numbers on the pages of the filing. For instance, the dcn field of table 2 for this filing, which is 03817287, is exactly equal the Film No. listed on the html page, in the last section of details on the owner ('AMOROSO ALFRED J').

bdcallen commented 5 years ago

@iangow I've had a look at both tables for the example filing I gave above, with multiple transactions of both derivative and non-derivative stock. The Film No. for this is 111142681, and doing

SELECT * FROM tfn.table1 WHERE owner = 'AMOROSO ALFRED J' AND dcn = '111142681'

and

SELECT * FROM tfn.table2 WHERE owner = 'AMOROSO ALFRED J' AND dcn = '111142681'

one can see that there are 10 entries for this filing in table 1, and 5 entries in table 2 (sorry, not displaying the output since there are too many columns), corresponding to all the transactions in the filing. So each row of these tables corresponds to a single transaction from some filing, and thus there are generally multiple rows in each table for the one filing.

bdcallen commented 5 years ago

@iangow Also, curiously, in tables 1 and 2, there seem to be no columns with the accession numbers and cik's, although there are some fields cusip6, cusip2 and cusipx which correspond with CUSIP numbers.

bdcallen commented 5 years ago

@iangow I just had a go at trying to connect the CIK number from filings to the CUSIP numbers in tfn.table2 for a particular filing, that I picked at random from filings. If one looks at this filing, which I generated from the file_name from filings, if one takes the Film no. which is 17537136, and does

SELECT * FROM tfn.table2 WHERE dcn = '17537136'

one finds that the cusip6 is 108035, the cusip2 is 10, and the cusipx is 6. Putting this together yields a full CUSIP number 108035106.

If one then takes the CIK number associated with the filing, which is 846617 (for BRIDGE BANCORP INC), and does

SELECT DISTINCT cusip, cik FROM edgar.cusip_cik WHERE cik = 846617

one sees that 108035106 is indeed one of the CUSIPS (the others were 10835106 (perhaps a mistake) and, curiously, a null).

Perhaps there is scope to do joins partially between these datasets through edgar.cusip_cik, though I am aware we obtained that table through the SC 13D forms, which are only filed when someone owns more than 5% of a firm. Also, there was a relatively recent filing which was a sale of derivative stock, but when I used the filming number for dcn, it didn't show up in tfn.table2 as it should have. I assume that could be because we haven't updated table2 on our side, though I'm curious to know how often tables 1 and 2 are updated.

iangow commented 5 years ago

@bdcallen

Don't worry about matching CUSIPs with CIKs for now. It's enough to know that the tfn tables do not include CIKs, as scraping data on CIKs is something of primary interest with these forms. I don't believe anything we have in the edgar tables maps to "film number" (and I believe we don't have any ID fields in common with the tfn data … this is a pain). I think we'd have to scrape film numbers if we wanted a solid link between tfn and edgar tables.