Only called in case no address id is specified for the address that is being imported. The result is then used to
Predict the id of the address that is being imported. The AUTO_INCREMENT is only fetched one, in case it has
not been fetched before. Every further call returns the cached value and increments it (post increment) for the
next call. This could lead to off-by-N issues.
Only called in case no customer id is specified for the customer that is being imported. The result is then used
to predict the id of the customer that is being imported.
The AUTO_INCREMENT is only fetched one, in case it has not been fetched before. Every further call returns the
cached value and increments it (post increment) for the next call. This could lead to off-by-N issues.
For import we only need to add optional parameter to \Magento\CustomerImportExport\Model\Import\Customer::__construct
Disadvantage:
Slower because it triggers much more queries, this will be worse if we are using AWS database cluster for import
Summary Concept 1 and Concept 2
Both concepts add two queries per new customer.
If we now think about to import 200.000 new customers on AWS, it is much slower than before. For those reasons, we
need a concept that helps us to optimize the queries that we send to MySQL.
Concept 3 - optimize queries
We need to split the imported bunch by if already know the customer, this we can reach with a query like.
To implement this, we need to move magento2ce/app/code/Magento/CustomerImportExport/Model/Import/Customer.php:537- 555 in a two-service class.
SELECT entity_id, email
FROM customer_entity
WHERE CONCAT(email,website_id) IN (
'malcolm85@gmail.com1',
'khuel@yahoo.com1',
'newcustomer@yahoo.com1'
);
Result:
entity_id
email
4290
khuel@yahoo.com
4288
malcolm85@gmail.com
The queried data helps to find all customers that we already have created in Magento. With this Information we can refactor the protected function _importData()
while ($bunch = $this->_dataSourceModel->getNextBunch()) {
$this->prepareCustomerData($bunch);
$entitiesToCreate = [];
$entitiesToUpdate = [];
$entitiesToDelete = [];
$attributesToSave = [];
$bunchDataByMail = [];
$customerAddresses = [];
foreach ($bunch as $rowNumber => $rowData) {
if (!$this->validateRow($rowData, $rowNumber)) {
continue;
}
if ($this->getErrorAggregator()->hasToBeTerminated()) {
$this->getErrorAggregator()->addRowToSkip($rowNumber);
continue;
}
$email = $rowData[self::COLUMN_EMAIL];
$customerAddresses[] = $email.$rowData['website_id'];
$bunchDataByMail[$email] = $rowData;
}
$query = $this->_connection->select()->from(
'customer_entity',
['entity_id', 'email']
)->where('CONCAT(email,website_id) in (?)', $customerAddresses);
if ($this->getBehavior($rowData) == Import::BEHAVIOR_DELETE) {
$entitiesToDelete = $this->_connection->fetchCol($query, 'entity_id');
} elseif ($this->getBehavior($rowData) == Import::BEHAVIOR_ADD_UPDATE) {
$entitiesToUpdate = $this->_connection->fetchAll($query, 'email');
/* should filter $validBunchData[$email] by $entitiesToUpdate and split them in two arrays $entitiesToUpdate and $entitiesToCreate*/
}
With these two arrays, we can use the row by email to create the data to import.
To generate new ids, I recommend to add functions:
A query that I run before I started performance testing:
CREATE TABLE `sequence` (
`type` varchar(20) NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`type`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO sequence VALUES ('customer', 1);
DELIMITER //
CREATE FUNCTION sequence(seq_type char (20)) RETURNS int
BEGIN
UPDATE sequence SET value=last_insert_id(value+1) WHERE type=seq_type;
RETURN last_insert_id();
END
//
DELIMITER ;
Implementation of _getNextEntityId()
protected function _getNextEntityId()
{
return $this->_connection->query("select sequence('customer')")->fetchColumn();
}
Time: 3.92 sec
IO: 511ms
CPU: 3.4sec
Network: 1.15 mb
Sequel Queries: 538.5ms 1,084rq
Summary
For all four concepts, we need to have a migration from customer_entity to sequence table idea like
migrateSequneceColumnData(customer_entity,entity_id). I think we will get the most benefit with the implementation of
Concept 3 because we are sending fewer queries to the database.
All concepts can be implemented in a backward-compatible way because we only touch constructors and protected functions for import.
Current usage in customer import (Open Source Edition)
\Magento\CustomerImportExport\Model\Import\Address::_getNextEntityId()
@apiAUTO_INCREMENT
is only fetched one, in case it has not been fetched before. Every further call returns the cached value and increments it (post increment) for the next call. This could lead to off-by-N issues.\Magento\CustomerImportExport\Model\Import\Customer::_getNextEntityId()
@apiAUTO_INCREMENT
is only fetched one, in case it has not been fetched before. Every further call returns the cached value and increments it (post increment) for the next call. This could lead to off-by-N issues.\Magento\CustomerImportExport\Model\Import\CustomerComposite::__construct()
@api$data
.Ideas of Refactoring
1. Use
Sequence
implementation from FrameworkEvery
Sequence
follows the apiMagento\Framework\DB\Sequence\SequenceInterface
The benchmark consists of importing 1000 customers an median of 3 measurements.
The benchmarking process is as follows:
System setup:
Unmodified
Concept 1 - Use
class CustomerSequence implements SequenceInterface
implementationAdvantage:
\Magento\CustomerImportExport\Model\Import\Customer::__construct
Disadvantage:
Concept 2 -
Magento\Framework\EntityManager\MetadataPool
implementationdi.xml:
Advantage:
\Magento\CustomerImportExport\Model\Import\Customer::__construct
Disadvantage:
Summary Concept 1 and Concept 2
Both concepts add two queries per new customer.
If we now think about to import 200.000 new customers on AWS, it is much slower than before. For those reasons, we need a concept that helps us to optimize the queries that we send to MySQL.
Concept 3 - optimize queries
We need to split the imported bunch by if already know the customer, this we can reach with a query like.
To implement this, we need to move magento2ce/app/code/Magento/CustomerImportExport/Model/Import/Customer.php:537- 555 in a two-service class.
Result:
The queried data helps to find all customers that we already have created in Magento. With this Information we can refactor the
protected function _importData()
With these two arrays, we can use the row by email to create the data to import.
To generate new ids, I recommend to add functions:
Sequence
EntityMetadata
Advantage:
Concept 4 - Stored Function to generate Sequences
I use as a base the following technical Article https://www.percona.com/blog/2008/04/02/stored-function-to-generate-sequences/
A query that I run before I started performance testing:
Implementation of
_getNextEntityId()
Summary
For all four concepts, we need to have a migration from
customer_entity
tosequence
table idea likemigrateSequneceColumnData(customer_entity,entity_id)
. I think we will get the most benefit with the implementation of Concept 3 because we are sending fewer queries to the database.All concepts can be implemented in a backward-compatible way because we only touch constructors and protected functions for import.