gregversteeg / CorEx

CorEx or "Correlation Explanation" discovers a hierarchy of informative latent factors. This reference implementation has been superseded by other versions below.
GNU General Public License v2.0
303 stars 53 forks source link

S&P500 data #7

Closed paseman closed 3 years ago

paseman commented 3 years ago

Re 2015 paper, I don't see the S&P components in any data directory; and quantquote does not seem to have it. Showing me a data source would be helpful. Thanks.

gregversteeg commented 3 years ago

Sorry, although some companies provide historical data for free they don't allow me to distribute it myself. quandl.com seems to still have free samples of the S&P 500 data. Or in this paper we got data from tradingeconomics.com (though I don't know if they have any for free).

paseman commented 3 years ago

Thanks, But given the goal is to reproduce your picture in order to see if I am running the code correctly, could you please give me the names of the 388 companies you ran, the month in 2013 that you started the run, the month in 1998 you started the run and any other parameters I might need to reproduce your published results? Thanks

gregversteeg commented 3 years ago

aa,aapl,abc,abt,ace,act,adbe,adi,adm,adp,adsk,aee,aep,aes,aet,afl,agn,aig,aiv,all,altr,alxn,amat,amd,amgn,amzn,an,anf,aon,apa,apc,apd,apol,arg,ati,avb,avp,avy,axp,azo,ba,bac,bax,bbby,bbt,bby,bcr,bdx,beam,ben,bfb,bhi,bk,bll,bmc,bms,bmy,brkb,bsx,bwa,bxp,c,ca,cag,cah,cam,cat,cb,cbs,cce,ccl,celg,cern,chk,chrw,ci,cinf,cl,clf,clx,cma,cmcsa,cmi,cms,cnp,cof,cog,cop,cost,cpb,csc,csco,csx,ctas,ctl,ctxs,cvc,cvs,cvx,d,dd,de,dell,df,dgx,dhi,dhr,dis,dltr,dnb,do,dov,dow,dri,dte,duk,dva,ea,ecl,ed,efx,eix,el,emc,emn,emr,eog,eqr,eqt,esrx,esv,etfc,etn,etr,expd,fast,fcx,fdo,fdx,fe,fhn,fisv,fitb,flir,fls,fmc,fosl,frx,ftr,gas,gci,gd,ge,gis,glw,gpc,gps,gt,gww,hal,har,has,hban,hcn,hcp,hd,hes,hig,hog,hon,hot,hp,hpq,hrb,hrl,hrs,hst,hsy,hum,ibm,iff,igt,intc,intu,ip,ipg,ir,irm,itw,jbl,jci,jcp,jdsu,jec,jnj,jpm,jwn,k,key,kim,klac,kmb,kmx,ko,kr,kss,l,leg,len,lh,lltc,lly,lm,lmt,lnc,low,lrcx,lsi,ltd,luk,luv,m,mar,mas,mat,mcd,mchp,mck,mco,mdt,mhfi,mkc,mmc,mmm,mo,molx,mrk,mro,ms,msft,msi,mtb,mu,mur,mwv,myl,nbl,nbr,ne,nee,nem,nfx,ni,nke,noc,nov,nsc,ntap,ntrs,nu,nue,nwl,oi,oke,omc,orcl,orly,oxy,payx,pbct,pbi,pcar,pcg,pcl,pcp,pdco,peg,pep,petm,pfe,pg,pgr,ph,phm,pki,pld,pll,pnc,pnr,pnw,pom,ppg,ppl,prgo,psa,pvh,px,pxd,qcom,r,rdc,rf,rhi,rl,rok,rop,rost,rrc,rtn,sbux,scg,schw,see,shw,sial,sjm,slb,slm,sna,sndk,so,spg,spls,srcl,sti,stj,stt,stz,swk,swn,swy,syk,symc,syy,t,tap,te,teg,ter,tgt,thc,tif,tjx,tmk,tmo,trow,trv,tsn,tso,tss,twx,txn,txt,tyc,unh,unm,unp,usb,utx,var,vfc,vlo,vmc,vno,vtr,vz,wag,wat,wdc,wec,wfc,wfm,whr,wm,wmb,wmt,wpo,wy,x,xel,xl,xlnx,xom,xray,xrx,yhoo,yum,zion

1998-01-09 to
2013-08-09

I shouldn't put the data online, but I can send it to you directly.

paseman commented 3 years ago

Quick question Given that there are 188 months from 199801 to 201308 Why does "change" have 196 rows?

Thanks again. Bill

import pickle import pandas as pd

change,stocks,mdates = pickle.load(open('data/mchange,stocks,mdates.dat','rb'), encoding='bytes') stocks = [s.decode('UTF-8') for s in stocks] print(change.shape,len(stocks),len(mdates))

pd.DataFrame(change,columns=stocks).to_csv('data/mchange,stocks,mdates.csv',index=False)

resample: 'M'onthly or 'W'eekly

df=pd.DataFrame([],index=pd.to_datetime(mdates, format="%Y%m%d")).resample('M').last()

if df.shape[0] != change.shape[0]:

1998-01-09

to

2013-08-0

print ("Why don't row counts match?",df.shape[0],change.shape[0],len([98,99,0,1,2,3,4,5,6,7,8,9,10,11,12])*12

On Mon, Jan 11, 2021 at 2:45 PM Greg Ver Steeg notifications@github.com wrote:

aa,aapl,abc,abt,ace,act,adbe,adi,adm,adp,adsk,aee,aep,aes,aet,afl,agn,aig,aiv,all,altr,alxn,amat,amd,amgn,amzn,an,anf,aon,apa,apc,apd,apol,arg,ati,avb,avp,avy,axp,azo,ba,bac,bax,bbby,bbt,bby,bcr,bdx,beam,ben,bfb,bhi,bk,bll,bmc,bms,bmy,brkb,bsx,bwa,bxp,c,ca,cag,cah,cam,cat,cb,cbs,cce,ccl,celg,cern,chk,chrw,ci,cinf,cl,clf,clx,cma,cmcsa,cmi,cms,cnp,cof,cog,cop,cost,cpb,csc,csco,csx,ctas,ctl,ctxs,cvc,cvs,cvx,d,dd,de,dell,df,dgx,dhi,dhr,dis,dltr,dnb,do,dov,dow,dri,dte,duk,dva,ea,ecl,ed,efx,eix,el,emc,emn,emr,eog,eqr,eqt,esrx,esv,etfc,etn,etr,expd,fast,fcx,fdo,fdx,fe,fhn,fisv,fitb,flir,fls,fmc,fosl,frx,ftr,gas,gci,gd,ge,gis,glw,gpc,gps,gt,gww,hal,har,has,hban,hcn,hcp,hd,hes,hig,hog,hon,hot,hp,hpq,hrb,hrl,hrs,hst,hsy,hum,ibm,iff,igt,intc,intu,ip,ipg,ir,irm,itw,jbl,jci,jcp,jdsu,jec,jnj,jpm,jwn,k,key,kim,klac,kmb,kmx,ko,kr,kss,l,leg,len,lh,lltc,lly,lm,lmt,lnc,low,lrcx,lsi,ltd,luk,luv,m,mar,mas,mat,mcd,mchp,mck,mco,mdt,mhfi,mkc,mmc,mmm,mo,molx,mrk,mro,ms,msft,msi,mtb,mu,mur,mwv,myl,nbl,nbr,ne,nee,nem,nfx,ni,nke,noc,nov,nsc,ntap,ntrs,nu,nue,nwl,oi,oke,omc,orcl,orly,oxy,payx,pbct,pbi,pcar,pcg,pcl,pcp,pdco,peg,pep,petm,pfe,pg,pgr,ph,phm,pki,pld,pll,pnc,pnr,pnw,pom,ppg,ppl,prgo,psa,pvh,px,pxd,qcom,r,rdc,rf,rhi,rl,rok,rop,rost,rrc,rtn,sbux,scg,schw,see,shw,sial,sjm,slb,slm,sna,sndk,so,spg,spls,srcl,sti,stj,stt,stz,swk,swn,swy,syk,symc,syy,t,tap,te,teg,ter,tgt,thc,tif,tjx,tmk,tmo,trow,trv,tsn,tso,tss,twx,txn,txt,tyc,unh,unm,unp,usb,utx,var,vfc,vlo,vmc,vno,vtr,vz,wag,wat,wdc,wec,wfc,wfm,whr,wm,wmb,wmt,wpo,wy,x,xel,xl,xlnx,xom,xray,xrx,yhoo,yum,zion

1998-01-09 to 2013-08-09

I shouldn't put the data online, but I can send it to you directly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gregversteeg/CorEx/issues/7#issuecomment-758272304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEIR5M42AWKMOLR2EC245DSZN5Q5ANCNFSM4V3MEYQQ .

gregversteeg commented 3 years ago

I don’t remember but it probably has to do with being defined in terms of number of trading days, rather than calendar month. If you look at the dates you’ll see it’s not always falling on same day of month.

Sent from my iPad

On Jan 13, 2021, at 7:18 PM, paseman notifications@github.com wrote:

 Quick question Given that there are 188 months from 199801 to 201308 Why does "change" have 196 rows?

Thanks again. Bill

import pickle import pandas as pd

change,stocks,mdates = pickle.load(open('data/mchange,stocks,mdates.dat','rb'), encoding='bytes') stocks = [s.decode('UTF-8') for s in stocks] print(change.shape,len(stocks),len(mdates))

pd.DataFrame(change,columns=stocks).to_csv('data/mchange,stocks,mdates.csv',index=False)

resample: 'M'onthly or 'W'eekly

df=pd.DataFrame([],index=pd.to_datetime(mdates, format="%Y%m%d")).resample('M').last()

if df.shape[0] != change.shape[0]:

1998-01-09

to

2013-08-0

print ("Why don't row counts match?",df.shape[0],change.shape[0],len([98,99,0,1,2,3,4,5,6,7,8,9,10,11,12])*12

  • 8)

On Mon, Jan 11, 2021 at 2:45 PM Greg Ver Steeg notifications@github.com wrote:

aa,aapl,abc,abt,ace,act,adbe,adi,adm,adp,adsk,aee,aep,aes,aet,afl,agn,aig,aiv,all,altr,alxn,amat,amd,amgn,amzn,an,anf,aon,apa,apc,apd,apol,arg,ati,avb,avp,avy,axp,azo,ba,bac,bax,bbby,bbt,bby,bcr,bdx,beam,ben,bfb,bhi,bk,bll,bmc,bms,bmy,brkb,bsx,bwa,bxp,c,ca,cag,cah,cam,cat,cb,cbs,cce,ccl,celg,cern,chk,chrw,ci,cinf,cl,clf,clx,cma,cmcsa,cmi,cms,cnp,cof,cog,cop,cost,cpb,csc,csco,csx,ctas,ctl,ctxs,cvc,cvs,cvx,d,dd,de,dell,df,dgx,dhi,dhr,dis,dltr,dnb,do,dov,dow,dri,dte,duk,dva,ea,ecl,ed,efx,eix,el,emc,emn,emr,eog,eqr,eqt,esrx,esv,etfc,etn,etr,expd,fast,fcx,fdo,fdx,fe,fhn,fisv,fitb,flir,fls,fmc,fosl,frx,ftr,gas,gci,gd,ge,gis,glw,gpc,gps,gt,gww,hal,har,has,hban,hcn,hcp,hd,hes,hig,hog,hon,hot,hp,hpq,hrb,hrl,hrs,hst,hsy,hum,ibm,iff,igt,intc,intu,ip,ipg,ir,irm,itw,jbl,jci,jcp,jdsu,jec,jnj,jpm,jwn,k,key,kim,klac,kmb,kmx,ko,kr,kss,l,leg,len,lh,lltc,lly,lm,lmt,lnc,low,lrcx,lsi,ltd,luk,luv,m,mar,mas,mat,mcd,mchp,mck,mco,mdt,mhfi,mkc,mmc,mmm,mo,molx,mrk,mro,ms,msft,msi,mtb,mu,mur,mwv,myl,nbl,nbr,ne,nee,nem,nfx,ni,nke,noc,nov,nsc,ntap,ntrs,nu,nue,nwl,oi,oke,omc,orcl,orly,oxy,payx,pbct,pbi,pcar,pcg,pcl,pcp,pdco,peg,pep,petm,pfe,pg,pgr,ph,phm,pki,pld,pll,pnc,pnr,pnw,pom,ppg,ppl,prgo,psa,pvh,px,pxd,qcom,r,rdc,rf,rhi,rl,rok,rop,rost,rrc,rtn,sbux,scg,schw,see,shw,sial,sjm,slb,slm,sna,sndk,so,spg,spls,srcl,sti,stj,stt,stz,swk,swn,swy,syk,symc,syy,t,tap,te,teg,ter,tgt,thc,tif,tjx,tmk,tmo,trow,trv,tsn,tso,tss,twx,txn,txt,tyc,unh,unm,unp,usb,utx,var,vfc,vlo,vmc,vno,vtr,vz,wag,wat,wdc,wec,wfc,wfm,whr,wm,wmb,wmt,wpo,wy,x,xel,xl,xlnx,xom,xray,xrx,yhoo,yum,zion

1998-01-09 to 2013-08-09

I shouldn't put the data online, but I can send it to you directly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gregversteeg/CorEx/issues/7#issuecomment-758272304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEIR5M42AWKMOLR2EC245DSZN5Q5ANCNFSM4V3MEYQQ .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.