jannisborn / covid19_ultrasound

Open source lung ultrasound (LUS) data collection initiative for COVID-19.
https://www.mdpi.com/2076-3417/11/2/672
151 stars 80 forks source link

get_and_process_web_data.sh deprecated and cropping problems #116

Closed ivolis closed 1 year ago

ivolis commented 1 year ago

Hi! I am having many troubles trying to use the "get_and_process_web_data.sh" according to the ReadME instructions.

Some deprecated links

youtube-dl -f 134 -o "tmp/pocus_videos/convex/Reg-Youtube.mp4" https://www.youtube.com/watch\?v\=VzgX9ihnmec\&ab_channel\=EM:RAPProductions

# >> OUTPUT << #

get_and_process_web_data.sh: 24: get_and_process_web_data.sh: youtube-dl: Permission denied
wget -O "tmp/pocus_videos/convex/Reg_Alines-1-90.mov" https://hls.cf.brightcove.com/1611106596001/4304256093001/1611106596001_4304256093001_s-1.ts\?pubId\=1611106596001\&videoId\=4304084884001
wget -O "tmp/pocus_videos/convex/Pneu_AIR BRONC2.mov" https://f1.media.brightcove.com/3/1611106596001/4304256034001/1611106596001_4304256034001_s-1.ts\?pubId\=1611106596001\&videoId\=4304052567001

# >> OUTPUT << #

Connecting to hls.cf.brightcove.com (hls.cf.brightcove.com)|13.227.83.103|:443... connected.
HTTP request sent, awaiting response... HTTP request sent, awaiting response... HTTP request sent, awaiting response... 502 Bad Gateway
2023-06-22 12:57:56 ERROR 502: Bad Gateway.

--2023-06-22 12:57:56--  https://f1.media.brightcove.com/3/1611106596001/4304256034001/1611106596001_4304256034001_s-1.ts?pubId=1611106596001&videoId=4304052567001
Resolving f1.media.brightcove.com (f1.media.brightcove.com)... 151.101.218.27
Connecting to f1.media.brightcove.com (f1.media.brightcove.com)|151.101.218.27|:443... connected.
HTTP request sent, awaiting response... 503 Backend is unhealthy
2023-06-22 12:57:56 ERROR 503: Backend is unhealthy.
wget -O tmp/pocus_images/convex/Reg_com_acquired_paper.png https://www.karger.com/WebMaterial/ShowPic/150968
wget -O tmp/pocus_images/convex/Pneu_air_bronchogram_com_acquired_paper.png https://www.karger.com/WebMaterial/ShowPic/150968

# >> OUTPUT << #

--2023-06-22 12:58:36--  https://www.karger.com/WebMaterial/ShowPic/150968
Resolving www.karger.com (www.karger.com)... 20.246.173.94
Connecting to www.karger.com (www.karger.com)|20.246.173.94|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://karger.com/WebMaterial/ShowPic/150968 [following]
--2023-06-22 12:58:37--  https://karger.com/WebMaterial/ShowPic/150968
Resolving karger.com (karger.com)... 20.246.173.94
Connecting to karger.com (karger.com)|20.246.173.94|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-06-22 12:58:38 ERROR 404: Not Found.

--2023-06-22 12:58:38--  https://www.karger.com/WebMaterial/ShowPic/150968
Resolving www.karger.com (www.karger.com)... 20.246.173.94
Connecting to www.karger.com (www.karger.com)|20.246.173.94|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://karger.com/WebMaterial/ShowPic/150968 [following]
--2023-06-22 12:58:38--  https://karger.com/WebMaterial/ShowPic/150968
Resolving karger.com (karger.com)... 20.246.173.94
Connecting to karger.com (karger.com)|20.246.173.94|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-06-22 12:58:39 ERROR 404: Not Found.
wget -O tmp/pocus_images/convex/Cov_siemens_1.png https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/documents/image/mda5/nzg3/~edisp/jun-5c1-lung-hepatization-07275343/~renditions/jun-5c1-lung-hepatization-07275343~8.jpg
wget -O tmp/pocus_images/convex/Cov_siemens_2.png https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/documents/image/mda5/nzg3/~edisp/jun-5c1-lung-4-b-lines-07275344/~renditions/jun-5c1-lung-4-b-lines-07275344~8.jpg 
wget -O tmp/pocus_images/linear/Cov_siemens_3.png https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/documents/image/mda5/nzg3/~edisp/seq-10l4-lung-b-lines-pleuritis-07275337/~renditions/seq-10l4-lung-b-lines-pleuritis-07275337~8.jpg
wget -O tmp/pocus_videos/linear/Reg_siemens_vid_3.mp4 "https://house-fastly-signed-eu-west-1-prod.brightcovecdn.com/media/v1/hls/v4/clear/2744552178001/5ef91327-742a-48cb-9ade-d04374e5dcac/49587bc4-0dac-48b4-ba3a-93d41d44cbf0/5x/segment0.ts?fastly_token=NjAwNGU4ZjNfMTQxNWMwMjBlNjI0NTEwMTBhNzE3OTBiYTY2M2Q3NWNmZTdkMDM4NDIwNzhjNDVhNmVhOWM0YzMzZTY4ZGMxN18vL2hvdXNlLWZhc3RseS1zaWduZWQtZXUtd2VzdC0xLXByb2QuYnJpZ2h0Y292ZWNkbi5jb20vbWVkaWEvdjEvaGxzL3Y0L2NsZWFyLzI3NDQ1NTIxNzgwMDEvNWVmOTEzMjctNzQyYS00OGNiLTlhZGUtZDA0Mzc0ZTVkY2FjLzQ5NTg3YmM0LTBkYWMtNDhiNC1iYTNhLTkzZDQxZDQ0Y2JmMC8%3D"

# >> OUTPUT << #

--2023-06-22 12:59:04--  https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/documents/image/mda5/nzg3/~edisp/jun-5c1-lung-hepatization-07275343/~renditions/jun-5c1-lung-hepatization-07275343~8.jpg
Resolving static.healthcare.siemens.com (static.healthcare.siemens.com)... failed: No address associated with hostname.
wget: unable to resolve host address ‘static.healthcare.siemens.com’
--2023-06-22 12:59:04--  https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/documents/image/mda5/nzg3/~edisp/jun-5c1-lung-4-b-lines-07275344/~renditions/jun-5c1-lung-4-b-lines-07275344~8.jpg
Resolving static.healthcare.siemens.com (static.healthcare.siemens.com)... failed: No address associated with hostname.
wget: unable to resolve host address ‘static.healthcare.siemens.com’
--2023-06-22 12:59:04--  https://static.healthcare.siemens.com/siemens_hwem-hwem_ssxa_websites-context-root/wcm/idc/groups/public/@global/documents/image/mda5/nzg3/~edisp/seq-10l4-lung-b-lines-pleuritis-07275337/~renditions/seq-10l4-lung-b-lines-pleuritis-07275337~8.jpg
Resolving static.healthcare.siemens.com (static.healthcare.siemens.com)... failed: No address associated with hostname.
wget: unable to resolve host address ‘static.healthcare.siemens.com’
--2023-06-22 12:59:04--  https://house-fastly-signed-eu-west-1-prod.brightcovecdn.com/media/v1/hls/v4/clear/2744552178001/5ef91327-742a-48cb-9ade-d04374e5dcac/49587bc4-0dac-48b4-ba3a-93d41d44cbf0/5x/segment0.ts?fastly_token=NjAwNGU4ZjNfMTQxNWMwMjBlNjI0NTEwMTBhNzE3OTBiYTY2M2Q3NWNmZTdkMDM4NDIwNzhjNDVhNmVhOWM0YzMzZTY4ZGMxN18vL2hvdXNlLWZhc3RseS1zaWduZWQtZXUtd2VzdC0xLXByb2QuYnJpZ2h0Y292ZWNkbi5jb20vbWVkaWEvdjEvaGxzL3Y0L2NsZWFyLzI3NDQ1NTIxNzgwMDEvNWVmOTEzMjctNzQyYS00OGNiLTlhZGUtZDA0Mzc0ZTVkY2FjLzQ5NTg3YmM0LTBkYWMtNDhiNC1iYTNhLTkzZDQxZDQ0Y2JmMC8%3D
Resolving house-fastly-signed-eu-west-1-prod.brightcovecdn.com (house-fastly-signed-eu-west-1-prod.brightcovecdn.com)... 146.75.126.27, 2a04:4e42:8f::539
Connecting to house-fastly-signed-eu-west-1-prod.brightcovecdn.com (house-fastly-signed-eu-west-1-prod.brightcovecdn.com)|146.75.126.27|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.

Some weird wget outputs

youtube-dl -f 136 -o "tmp/pocus_videos/linear/Reg-Youtube-start20sec.mp4" https://www.youtube.com/watch?v=Qd-26HdJP6I&ab
youtube-dl -f 136 -o "tmp/pocus_videos/linear/Pneu-Youtube-start20sec.mp4" https://www.youtube.com/watch?v=Qd-26HdJP6I&ab
youtube-dl -f 18 -o "tmp/pocus_videos/linear/Reg-Youtube-Video_902_Lung_POCUS-left.mp4" https://www.youtube.com/watch?v=HqPXJ0A0HCU&ab

# >> OUTPUT << #

get_and_process_web_data.sh: 25: get_and_process_web_data.sh: ab: not found
get_and_process_web_data.sh: 26: get_and_process_web_data.sh: ab: not found
get_and_process_web_data.sh: 27: get_and_process_web_data.sh: ab: not found
get_and_process_web_data.sh: 25: get_and_process_web_data.sh: youtube-dl: Permission denied
get_and_process_web_data.sh: 26: get_and_process_web_data.sh: youtube-dl: Permission denied
get_and_process_web_data.sh: 27: get_and_process_web_data.sh: youtube-dl: Permission denied

Maybe the &ab doesn't need to be there on the youtube link?

wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid1.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0001-VideoS1.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid2.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0002-VideoS2.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid3.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0003-VideoS3.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid4.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0004-VideoS4.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid5.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0005-VideoS5.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid6.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0006-VideoS6.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid7.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0007-VideoS7.avi"
wget -O tmp/pocus_videos/convex/Cov_new_pregnant_vid8.avi "https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0008-VideoS8.avi"

# >> OUTPUT << #

--2023-06-22 12:58:34--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0001-VideoS1.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.129.87, 162.159.130.87, 2606:4700:7::a29f:8157, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.129.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:34 ERROR 403: Forbidden.

--2023-06-22 12:58:34--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0002-VideoS2.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.130.87, 162.159.129.87, 2606:4700:7::a29f:8157, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.130.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:34 ERROR 403: Forbidden.

--2023-06-22 12:58:34--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0003-VideoS3.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.129.87, 162.159.130.87, 2606:4700:7::a29f:8257, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.129.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:35 ERROR 403: Forbidden.

--2023-06-22 12:58:35--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0004-VideoS4.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.130.87, 162.159.129.87, 2606:4700:7::a29f:8157, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.130.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:35 ERROR 403: Forbidden.

--2023-06-22 12:58:35--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0005-VideoS5.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.129.87, 162.159.130.87, 2606:4700:7::a29f:8257, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.129.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:35 ERROR 403: Forbidden.

--2023-06-22 12:58:35--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0006-VideoS6.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.130.87, 162.159.129.87, 2606:4700:7::a29f:8157, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.130.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:35 ERROR 403: Forbidden.

--2023-06-22 12:58:36--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0007-VideoS7.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.129.87, 162.159.130.87, 2606:4700:7::a29f:8257, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.129.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:36 ERROR 403: Forbidden.

--2023-06-22 12:58:36--  https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fjum.15367&file=jum15367-sup-0008-VideoS8.avi
Resolving onlinelibrary.wiley.com (onlinelibrary.wiley.com)... 162.159.130.87, 162.159.129.87, 2606:4700:7::a29f:8157, ...
Connecting to onlinelibrary.wiley.com (onlinelibrary.wiley.com)|162.159.130.87|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:58:36 ERROR 403: Forbidden.

Perhaps something about being a .avi file that downloads directly? idk.

wget -O tmp/pocus_videos/linear/Cov_linear_abrams_2020_v1.mp4 https://www.jem-journal.com/cms/10.1016/j.jemermed.2020.06.032/attachment/69e81993-824e-4a79-9b61-8683b242a328/mmc1.mp4

# >> OUTPUT << #

--2023-06-22 12:59:03--  https://www.jem-journal.com/cms/10.1016/j.jemermed.2020.06.032/attachment/69e81993-824e-4a79-9b61-8683b242a328/mmc1.mp4
Resolving www.jem-journal.com (www.jem-journal.com)... 104.18.124.114, 104.18.123.114
Connecting to www.jem-journal.com (www.jem-journal.com)|104.18.124.114|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-22 12:59:04 ERROR 403: Forbidden.

Cropping process

When the .sh file runs the cropping process I get a lot of error and warnings. Mostly they look like this:

pocus_videos/convex/Reg-nephropocus.gif [[25, 90, 250], [0, 30.0]]
[ERROR:0@0.253] global cap.cpp:166 open VIDEOIO(CV_IMAGES): raised OpenCV exception:

OpenCV(4.7.0) /tmp/pip-install-almfh048/opencv-contrib-python/opencv/modules/videoio/src/cap_images.cpp:253: error: (-5:Bad argument) CAP_IMAGES: can't find starting number (in the name of file): tmp/pocus_videos/convex/Reg-nephropocus.gif in function 'icvExtractPattern'

Problem reading file: tmp/pocus_videos/convex/Reg-nephropocus.gif
pocus_videos/convex/Reg-Youtube.mp4 [[40, 150, 280], [2299, 2429]]
Problem reading file: tmp/pocus_videos/convex/Reg-Youtube.mp4
pocus_videos/convex/pneu-everyday.gif [[30, 90, 340], [0, 61.0]]

Is there any fix to this ? At least on my end?

jannisborn commented 1 year ago

Hi,

Thanks for the interest in the repo and reporting this.