mccgr / edgar

Code to manage data related to SEC EDGAR
31 stars 15 forks source link

Handle alternative variable names from older Form 3, 4, 5 filings #45

Closed bdcallen closed 4 years ago

bdcallen commented 5 years ago

@iangow This is a follow-on from #42. If one looks at the derivativeSecurity or nonDerivativeSecurity nodes in, for example this filing, one sees that most of the variable names are the same

<nonDerivativeSecurity>
<securityTitle>
<value>Common Stock</value>
</securityTitle>
<transactionDate>
<value>2003-07-01</value>
</transactionDate>
<transactionCoding>
<transactionFormType>4</transactionFormType>
<transactionCode>S</transactionCode>
<equitySwapInvolved>0</equitySwapInvolved>
<footnoteId id="F1"/>
</transactionCoding>
<transactionAmounts>
<transactionShares>
<value>17097</value>
</transactionShares>
<transactionValue>
<value>11.05</value>
</transactionValue>
<transactionAcquiredDisposedCode>
<value>D</value>
</transactionAcquiredDisposedCode>
</transactionAmounts>
<postTransactionAmounts>
<sharesOwnedFollowingTransaction>
<value>2031898</value>
</sharesOwnedFollowingTransaction>
</postTransactionAmounts>
<ownershipNature>
<directOrIndirectOwnership>
<value>I</value>
</directOrIndirectOwnership>
<natureOfOwnership>
<value>
By self, as General Partner of Our Ship Limited Partnership Ltd.
</value>
</natureOfOwnership>
</ownershipNature>
</nonDerivativeSecurity>

but in the node transactionAmounts, there is a subnode called transactionValue which is given as one of the variables, whereas newer filings have the variable transactionPricePerShare, which is the variable name currently in the tables. I'm thinking that these are probably the same thing.

Furthermore, there could be other variable names which arise in the older filings.

bdcallen commented 5 years ago

@iangow Having had close look at quite a few filings, comparing the xml documents with the htmls, and the htmls for these old filings with htmls for modern filings, I'm confident transactionValue is indeed the transactionPricePerShare, rather than transactionTotalValue. I'm putting this piece of code

if('transactionValue' %in% colnames(part_old)) {

      colnames(part_old)[colnames(part_old) == 'transactionValue'] <- 'transactionPricePerShare'

  }

in both get_nonDerivative_df and get_derivative_df. I will keep an eye on whether there are more inconsistencies as I go.

iangow commented 5 years ago

@iangow Having had close look at quite a few filings, comparing the xml documents with the htmls, and the htmls for these old filings with htmls for modern filings, I'm confident transactionValue is indeed the transactionPricePerShare, rather than transactionTotalValue. I'm putting this piece of code ...

@bdcallen Again, I think it is better to use a dplyr-based approach. So mutate_if(‘transactionValue’ %in% colnames(.), transactionValue = transactionPricePerShare) (or something using if_else or even just rename_if). I think this will help to produce an easier-to-maintain codebase across the repository (and other MCCGR repositories).

Sent with GitHawk

iangow commented 5 years ago

@bdcallen Please don't assign issues to me if I'm not the one doing (at least some of) the work. I see these issues and comments thereon if you mention @iangow.

bdcallen commented 5 years ago

@iangow

df <- data.frame(a = c(1, 2), b = c(3, 5), c = c(8, 13))
> df
  a b  c
1 1 3  8
2 2 5 13

> colnames(df) <- ifelse(colnames(df) == 'b', 'd', colnames(df))
> df
  a d  c
1 1 3  8
2 2 5 13

I couldn't get mutate_if to work on the above example (gave me an error when I tried it as you wrote above), but ifelse worked as desired. I already made the change I mentioned in a post above, and it works fine, but obviously the ifelse equivalent is simpler to write and read, so perhaps I can make the change to the ifelse statement once my code is finished processing the filings. Otherwise, I think we can consider this issue closed.