sul-dlss / was-pywb

Configuration for Stanford's pywb instance
https://swap.stanford.edu
Other
2 stars 0 forks source link

Youtube video fails to play #147

Open edsu opened 2 years ago

edsu commented 2 years ago

Reported by @peterchanws on 2022-09-29:

There was interruption when uploading the wacz file in https://argo.stanford.edu/view/druid:sv558gk6917 However, the upload was marked completed. However, the seed was not render properly (not able to play video) in swap. I uploaded the file (same file name) again https://argo.stanford.edu/view/druid:yx015cx1366. Problem still there. I uploaded the file (different file name) again. The problem is not solved. https://swap.stanford.edu/was/*?url=https%3A%2F%2Fnodontdie.com%2F. I upload the same file to stage https://argo-stage.stanford.edu/view/druid:db175fk4446 and it works fine https://swap-stage.stanford.edu/was/20220921195257/https://nodontdie.com/

The failure to play back can be seen when trying to play this video, which works in stage but not in production:

@andrewjbtw confirmed that the WACZ files for sv558gk6917 and yx015cx1366 in production and stage have the same fixity value. I also checked that the WARC files and CDX entries are identical.

However @peterchanaws also reported that another version was uploaded to SDR as well, which did not finish uploading:

This does appear to be 331 MB instead of 12.3 GB. It is possible that this truncated WACZ is somehow corrupting playback?

The CDX files as a whole are much larger as a whole in production, than in stage. There is also a previous Archive-It crawl from 2020 in Production, which is evident when comparing the index results:

edsu commented 2 years ago

Watching the HTTP requests in Production for:

https://swap.stanford.edu/was/20220921195257/https://nodontdie.com/louie-roots

You can see that this request generates a HTTP 500 response after spinning for 2 minutes:

https://swap.stanford.edu/was/20220921195743oe_/https://rr3---sn-o097znsl.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&mh=gi&mm=31%2C29&mn=sn-o097znsl%2Csn-n4v7snls&ms=au%2Crdu&mv=m&mvi=3&pl=20&initcwndbps=227500&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&mt=1663789767&fvip=4&fexp=24001373%2C24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgHI4spdAtiH_OC3nQts77b8KCHteX4wt3SADv23oi3NkCIQDbg4ga_E4nug6E_xhKh-bhX1K7BRlCjIhlIoxIwrr4EQ%3D%3D&cpn=8lLBIK9wGaHrC5jh&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc

whereas it works fine in Stage:

https://was-pywb-stage.stanford.edu/was/20220921195743oe_/https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA

It would be good to see what the actual error but it doesn't appear to be logged to the Apache error log.

edsu commented 2 years ago

I saw a bunch of these (104) in the /var/log/syslog when the request failed:

Oct  3 18:12:36 was-pywb-prod poetry[993635]: Mon Oct  3 18:12:36 2022 - *** uWSGI listen queue of socket ":8081" (fd: 5) full !!! (101/100) ***
edsu commented 1 year ago

I'm wondering if the duplicate entries from having ingested both

If you look at the underlying CDX request for this you see this response:

{"urlkey": "com,googlevideo,rr4---sn-a5mekn6s)/videoplayback?c=web_embedded_player&cm2rm=sn-o09sd7s&cms_redirect=yes&cmsv=e&cnr=14&cpn=bik9wgahrc5jhvyi&cver=1.20220918.00.00&dur=5588.230&ei=umwry6pgdcon2lypre2i2am&expire=1663811866&fexp=24001373,24007246&id=o-adz2f4-l-smqnl8l31k7a7-cg3h0wt_ga8aikxgcxg2t&ip=137.184.177.15&itag=18&lmt=1643054906292951&lsig=ag3c_xawraigf5-rgf2xd0lozxcgzwxgoxqhqzmgwxa-vdytr57h5lccieavokgnaie3zdafqho7fbimphutntsriqa9kdgbdema&lsparams=mh,mm,mn,ms,mv,mvi,pl&mh=gi&mime=video/mp4&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&n=1kzdgoh5orw1ag&ns=hgbb8j72m15xhohixiz33qsi&pl=20&pltype=contentugc&ptk=youtube_none&ratebypass=yes&redirect_counter=1&req_id=7066179c6559a3ee&requiressl=yes&sig=aoq0qj8wrqigi72ue5jztjgbj2jadhg0fmfpndh_ec1gpi7il-a95riciqd4jentwa5oqsixjisc_ijd1dscp3yxazfac6xaxhsw7w==&source=youtube&sparams=expire,ei,ip,id,itag,source,requiressl,spc,vprv,mime,ns,cnr,ratebypass,dur,lmt&spc=yr2vp5oyfrtwxtxwdiqfteeej98xy1c&txp=6218224&vprv=1", "timestamp": "20220921195748", "url": "https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA", "mime": "video/mp4", "status": "200", "digest": "sha1:XWXMGAQPGUJI6FHCAWLI6ABQQ5F5Y6OB", "length": "161861539", "offset": "383819330", "filename": "qg137zm7702/sv/558/gk/6917/don-t-die-peter-chan-rec-20220921195258533131-crawl-manual-20220921195228-385b66de-86c-0.warc.gz", "source": "was:level2.cdxj", "source-coll": "was", "access": "allow"}
{"urlkey": "com,googlevideo,rr4---sn-a5mekn6s)/videoplayback?c=web_embedded_player&cm2rm=sn-o09sd7s&cms_redirect=yes&cmsv=e&cnr=14&cpn=bik9wgahrc5jhvyi&cver=1.20220918.00.00&dur=5588.230&ei=umwry6pgdcon2lypre2i2am&expire=1663811866&fexp=24001373,24007246&id=o-adz2f4-l-smqnl8l31k7a7-cg3h0wt_ga8aikxgcxg2t&ip=137.184.177.15&itag=18&lmt=1643054906292951&lsig=ag3c_xawraigf5-rgf2xd0lozxcgzwxgoxqhqzmgwxa-vdytr57h5lccieavokgnaie3zdafqho7fbimphutntsriqa9kdgbdema&lsparams=mh,mm,mn,ms,mv,mvi,pl&mh=gi&mime=video/mp4&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&n=1kzdgoh5orw1ag&ns=hgbb8j72m15xhohixiz33qsi&pl=20&pltype=contentugc&ptk=youtube_none&ratebypass=yes&redirect_counter=1&req_id=7066179c6559a3ee&requiressl=yes&sig=aoq0qj8wrqigi72ue5jztjgbj2jadhg0fmfpndh_ec1gpi7il-a95riciqd4jentwa5oqsixjisc_ijd1dscp3yxazfac6xaxhsw7w==&source=youtube&sparams=expire,ei,ip,id,itag,source,requiressl,spc,vprv,mime,ns,cnr,ratebypass,dur,lmt&spc=yr2vp5oyfrtwxtxwdiqfteeej98xy1c&txp=6218224&vprv=1", "timestamp": "20220921195748", "url": "https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA", "mime": "video/mp4", "status": "200", "digest": "sha1:XWXMGAQPGUJI6FHCAWLI6ABQQ5F5Y6OB", "length": "161861539", "offset": "383819330", "filename": "qg137zm7702/yh/223/zs/4674/don-t-die-peter-chan-20220921202603400-385b66de-86c-0-rec-20220921195258533131-crawl-manual-20220921195228-385b66de-86c-0.warc.gz", "source": "was:level2.cdxj", "source-coll": "was", "access": "allow"}
{"urlkey": "com,googlevideo,rr4---sn-a5mekn6s)/videoplayback?c=web_embedded_player&cm2rm=sn-o09sd7s&cms_redirect=yes&cmsv=e&cnr=14&cpn=bik9wgahrc5jhvyi&cver=1.20220918.00.00&dur=5588.230&ei=umwry6pgdcon2lypre2i2am&expire=1663811866&fexp=24001373,24007246&id=o-adz2f4-l-smqnl8l31k7a7-cg3h0wt_ga8aikxgcxg2t&ip=137.184.177.15&itag=18&lmt=1643054906292951&lsig=ag3c_xawraigf5-rgf2xd0lozxcgzwxgoxqhqzmgwxa-vdytr57h5lccieavokgnaie3zdafqho7fbimphutntsriqa9kdgbdema&lsparams=mh,mm,mn,ms,mv,mvi,pl&mh=gi&mime=video/mp4&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&n=1kzdgoh5orw1ag&ns=hgbb8j72m15xhohixiz33qsi&pl=20&pltype=contentugc&ptk=youtube_none&ratebypass=yes&redirect_counter=1&req_id=7066179c6559a3ee&requiressl=yes&sig=aoq0qj8wrqigi72ue5jztjgbj2jadhg0fmfpndh_ec1gpi7il-a95riciqd4jentwa5oqsixjisc_ijd1dscp3yxazfac6xaxhsw7w==&source=youtube&sparams=expire,ei,ip,id,itag,source,requiressl,spc,vprv,mime,ns,cnr,ratebypass,dur,lmt&spc=yr2vp5oyfrtwxtxwdiqfteeej98xy1c&txp=6218224&vprv=1", "timestamp": "20220921195748", "url": "https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA", "mime": "video/mp4", "status": "200", "digest": "sha1:XWXMGAQPGUJI6FHCAWLI6ABQQ5F5Y6OB", "length": "161861539", "offset": "383819330", "filename": "qg137zm7702/yx/015/cx/1366/don-t-die-peter-chan-20220921202603400-385b66de-86c-0-rec-20220921195258533131-crawl-manual-20220921195228-385b66de-86c-0.warc.gz", "source": "was:level2.cdxj", "source-coll": "was", "access": "allow"}

The last two lines have identical properties except for filename. I wonder if this could be causing problems for fuzzy matching?

peterchanws commented 1 year ago

Should we try to remove the duplicated entries?

edsu commented 1 year ago

Yes, I'm going to try to copy these WARCs out of the production environment and see if I can replicate the problem in my development environment. If that works then I can experiment with removing the duplicates to see if that helps? I get nervous about experimenting with the CDX files in production!

edsu commented 1 year ago

I've been able to replicate the problem in my development environment by downloading the WARCs for sv558gk6917, yh223zs4674 and yx015cx1366 and indexing them with cdxj-indexer. At this point playback works fine, even with the duplicate WARC content.

But then I copied the CDX entries from our production level3.cdxj that matched ^com,googlevideo, and added them to my development index as well (8,323,879 index entries). After this the request above that fails in production but not in stage results in a 7 minute wait for a response.

http://localhost:8080/test/20220921195743oe_/https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA

In our production environment uwsgi is giving up on this request because it is taking too long, and realistically our users won't wait 7 minutes for a response anyway.

Here is what I see in the pywb log when running it manually with the wayback command:

127.0.0.1 - - [2022-10-07 03:11:05] "POST /test/resource/postreq?url=https%3A%2F%2Frr3---sn-o097znsl.googlevideo.com%2Fvideoplayback%3Fexpire%3D1663811866%26ei%3DumwrY6PgDcON2LYPre2I2AM%26ip%3D137.184.177.15%26id%3Do-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t%26itag%3D18%26source%3Dyoutube%26requiressl%3Dyes%26mh%3Dgi%26mm%3D31%252C29%26mn%3Dsn-o097znsl%252Csn-n4v7snls%26ms%3Dau%252Crdu%26mv%3Dm%26mvi%3D3%26pl%3D20%26initcwndbps%3D227500%26spc%3DyR2vp5OyfrTwxTXwDIqftEEeJ98XY1c%26vprv%3D1%26mime%3Dvideo%252Fmp4%26ns%3DHGBB8J72m15xHOhIxIZ33QsI%26cnr%3D14%26ratebypass%3Dyes%26dur%3D5588.230%26lmt%3D1643054906292951%26mt%3D1663789767%26fvip%3D4%26fexp%3D24001373%252C24007246%26c%3DWEB_EMBEDDED_PLAYER%26txp%3D6218224%26n%3D1kZdGOH5ORW1ag%26sparams%3Dexpire%252Cei%252Cip%252Cid%252Citag%252Csource%252Crequiressl%252Cspc%252Cvprv%252Cmime%252Cns%252Ccnr%252Cratebypass%252Cdur%252Clmt%26sig%3DAOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%253D%253D%26lsparams%3Dmh%252Cmm%252Cmn%252Cms%252Cmv%252Cmvi%252Cpl%252Cinitcwndbps%26lsig%3DAG3C_xAwRQIgHI4spdAtiH_OC3nQts77b8KCHteX4wt3SADv23oi3NkCIQDbg4ga_E4nug6E_xhKh-bhX1K7BRlCjIhlIoxIwrr4EQ%253D%253D%26cpn%3D8lLBIK9wGaHrC5jh%26cver%3D1.20220918.00.00%26ptk%3Dyoutube_none%26pltype%3Dcontentugc&closest=20220921195743&matchType=exact HTTP/1.1" 200 7501 420.605267
127.0.0.1 - - [2022-10-07 03:11:05] "GET /test/20220921195743oe_/https://rr3---sn-o097znsl.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&mh=gi&mm=31%2C29&mn=sn-o097znsl%2Csn-n4v7snls&ms=au%2Crdu&mv=m&mvi=3&pl=20&initcwndbps=227500&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&mt=1663789767&fvip=4&fexp=24001373%2C24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgHI4spdAtiH_OC3nQts77b8KCHteX4wt3SADv23oi3NkCIQDbg4ga_E4nug6E_xhKh-bhX1K7BRlCjIhlIoxIwrr4EQ%3D%3D&cpn=8lLBIK9wGaHrC5jh&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc HTTP/1.1" 302 7061 420.617071

I'm not sure why the one requests shows up as two (a POST and a GET) in the log. I wonder if there is something we can be doing here to speed things up. I'm kind of confused why if the CDX lookup is quick why replay is not.

peterchanws commented 1 year ago

What happens if you remove duplicates in the development environment? Would it work normal again?

edsu commented 1 year ago

No, the duplicates are actually not negatively impacting the replay. The problem is the amount of time pywb is spending looking through the googlevideo.com CDX entries. The good news that now that we can replicate it outside of the production environment it'll be easier to investigate what is going on, and maybe see if there is an optimization to be made.

edsu commented 1 year ago

A possibly related pywb issue: https://github.com/webrecorder/pywb/issues/573

peterchanws commented 1 year ago

Another case on same issue: https://swap.stanford.edu/was/20230505172005/https://library.stanford.edu/projects/player-piano-program

edsu commented 7 months ago

Another case: https://swap.stanford.edu/was/20240223040850/https://einstein.stanford.edu/Media/GP-B_Nutshell_Video.html