netarchivesuite / so-me

Social Media harvests
Apache License 2.0
8 stars 0 forks source link

Expand Twitter resource extraction #7

Open tokee opened 4 years ago

tokee commented 4 years ago

The extraction rules for links to resources in https://github.com/netarchivesuite/webarchive-discovery/blob/some/warc-indexer/src/main/java/uk/bl/wa/analyser/payload/TwitterAnalyser.java should be synchronized to the resource getter for harvests.

tokee commented 2 years ago

The resource URL extractor has been expanded with embedded videos. This does not fully close this issue.