The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items.
It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS).
The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule
Feed exports <topics-feed-exports> now support Google Cloud
Storage <topics-feed-storage-gcs> as a storage backend
The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items.
It also serves as a workaround for delayed file delivery
<delayed-file-delivery>, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3 <topics-feed-storage-s3>, FTP <topics-feed-storage-ftp>, and now GCS <topics-feed-storage-gcs>).
The base implementation of item loaders <topics-loaders> has been moved into a separate library, itemloaders <itemloaders:index>, allowing usage from outside Scrapy and a separate release schedule
Deprecation removals
Removed the following classes and their parent modules from scrapy.linkextractors:
htmlparser.HtmlParserLinkExtractor
regex.RegexLinkExtractor
sgml.BaseSgmlLinkExtractor
sgml.SgmlLinkExtractor
Use LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor> instead (4356, 4679)
Deprecations
The scrapy.utils.python.retry_on_eintr function is now deprecated (4683)
New features
Feed exports <topics-feed-exports> support Google Cloud
Storage <topics-feed-storage-gcs> (685, 3608)
New FEED_EXPORT_BATCH_ITEM_COUNT setting for batch deliveries (4250, 4434)
The parse command now allows specifying an output file (4317, 4377)
Request.from_curl <scrapy.http.Request.from_curl> and ~scrapy.utils.curl.curl_to_request_kwargs now also support --data-raw (4612)
A parse callback may now be used in built-in spider subclasses, such as ~scrapy.spiders.CrawlSpider (712, 732, 781, 4254 )
Bug fixes
Fixed the CSV exporting <topics-feed-format-csv> of dataclass items <dataclass-items> and attr.s items
<attrs-items> (4667, 4668)
Request.from_curl <scrapy.http.Request.from_curl> and ~scrapy.utils.curl.curl_to_request_kwargs now set the request method to POST when a request body is specified and no request method is specified (4612)
The processing of ANSI escape sequences in enabled in Windows 10.0.14393 and later, where it is required for colored output (4393, 4403)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
If all status checks pass Dependabot will automatically merge this pull request.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme
Additionally, you can set the following in the `.dependabot/config.yml` file in this repo:
- Update frequency
- Automerge options (never/patch/minor, and dev/runtime dependencies)
- Out-of-range updates (receive only lockfile updates, if desired)
- Security updates (receive only security updates, if desired)
Bumps scrapy from 2.2.1 to 2.3.0.
Release notes
Sourced from scrapy's releases.
Changelog
Sourced from scrapy's changelog.
Commits
1278e76
Bump version: 2.2.0 → 2.3.03600582
Cover Scrapy 2.2.1 and 2.3 in the release notes (#4708)015b71d
Merge pull request #4704 from Gallaecio/python2-u-prefixes890b213
Remove the u prefix from strings5e2d1bd
Merge pull request #4434 from BroodingKangaroo/ISSUE-4250-add_batch_deliveries5265853
Use ItemAdapter.field_names when writing header in CsvItemExporter (#4668)a6c1d79
pep8 tiny changesce0c25f
Merge pull request #4690 from elacuesta/typing-setup-remove-monkeypatches8fae3d5
Remove monkeypatches module from mypy section in setup.cfgf3372a3
Merge pull request #4254 from elacuesta/spider.parseDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.If all status checks pass Dependabot will automatically merge this pull request.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in the `.dependabot/config.yml` file in this repo: - Update frequency - Automerge options (never/patch/minor, and dev/runtime dependencies) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired)