mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Work out what to do with footnotes with parent node names #35

Closed bdcallen closed 4 years ago

bdcallen commented 5 years ago

@iangow It turns out that my function get_full_footnotes_indices sometimes assigns the name of a node which is a parent to several variables for which my functions get the data for in get_derivative_df and get_nonDerivative_df. An example is this filing, for which get_full_footnotes_indices assigns transactionCoding to variable for several entries in its table as can be seen below

> get_full_footnote_indices(xml_root, file_name, document)
                                     file_name            document  table seq footnote_variable footnote_index
1  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   1 transactionCoding             F1
2  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   1 natureOfOwnership             F2
3  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   2 transactionCoding             F1
4  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   2 natureOfOwnership             F2
5  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   3 transactionCoding             F1
6  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   3 natureOfOwnership             F2
7  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   4 transactionCoding             F1
8  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   4 natureOfOwnership             F2
9  edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   5 transactionCoding             F1
10 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   5 natureOfOwnership             F2
11 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   6 transactionCoding             F1
12 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   6 natureOfOwnership             F2
13 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   7 transactionCoding             F1
14 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   7 natureOfOwnership             F2
15 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1   9 natureOfOwnership             F3
16 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1  10 natureOfOwnership             F3
17 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1  11 natureOfOwnership             F3
18 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml table1  12 natureOfOwnership             F3

The footnotes for this case are

> get_footnotes(xml_root, file_name, document)
                                    file_name            document index
1 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml    F1
2 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml    F2
3 edgar/data/1001233/0001209191-07-018488.txt bsf28410_bsf1el.xml    F3
                                                                                                                                                                                                                                                             footnote
1                                                                                                                            The sales reported in this Form 4 were effected pursuant to a Rule 10b5-1 trading plan adopted by the Reporting Person on July 28, 2006.
2                                                                                                    These shares are held in the Edward O. Lanphier II and Cameron M. Lanphier Trust U/T/A August 30, 2002, Edward O. Lanphier II and Cameron M. Lanphier, Trustees.
3 Reporting Person disclaims beneficial ownership of the shares held by each of his children and this report shall not be deemed to be an admission that Mr. Lanphier is the beneficial owner of such securities for purposes of Section 16 or for any other purpose.

transactionCoding is the name of a common node which contains the information for the variables transactionFormType, transactionCode and equitySwapInvolved. For this filing, if you look at the xml file, the footnote F1 is indeed written as a direct child of transactionCoding. This actually is not in accordance with the SEC's style guide for the xml documents for Forms 3, 4 and 5 (which I have largely based my scraping functions on). Rather, the footnote should be a child of one of transactionFormType, transactionCode and equitySwapInvolved in this case.

I'm just wondering how we should approach treating these cases. Assign the footnote to all the parent node's children? Or keep the parent node as the name?

bdcallen commented 5 years ago

@iangow As I've mentioned in an email to you, I've decided to leave the footnote_variable as transactionCoding in these cases for now. This is possible to do since none of its child node names (transactionCode, transactionFormType or equitySwapInvolved) appear in footnote_variable field, making these cases easy to fix if we decide to make a change. We can decide what to do here after we've downloaded the data.

bdcallen commented 4 years ago

@iangow I am going to close this for now.