magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.48k stars 9.29k forks source link

Refresh Lifetime Statistics destroy statistic data #33896

Open bernd-reindl opened 3 years ago

bernd-reindl commented 3 years ago

Preconditions (*)

Magento 2.4.x

Steps to reproduce (*)

  1. View multiple products as a guest or customer on frontend from different browsers so there will be multiple entires in report_event and report_viewed_product_index table.
  2. If already have visitor logs and data in report_event and report_viewed_product_index tables that will be great.
  3. Login into Admin and refresh lifetime static data for product view from Reports -> STATISTICS -> Refresh Statistics.
  4. Open Product Views Report from Reports -> PRODUCTS -> Views
  5. Select rage and show reports.
  6. You will see the views of products.
  7. Now run cron to run manually change the expression of visitor_clean cron to run now * . and run bin/magento cron:run
  8. you will see data from customer_visitor and report_event cleaned.
  9. now repeat steps 3 to 6, Product view log will not be show but data in report_viewed_product_index still exists.

Expected result (*)

Report for the view products must be shown because logs exists in the table report_viewed_product_index.

Actual result (*)

Most viewed product reports does not show the correct report.

Description

Everyday a midnight the cron job 'visitor_clean' cleans the visitors log. This is done by \Magento\Customer\Model\Visitor::clean() which calls \Magento\Customer\Model\ResourceModel\Visitor::clean(). The module "Magento_Reports" has a plugin "afterClean" for \Magento\Customer\Model\ResourceModel\Visitor::clean(). This plugin removes all entries from 'report_event' which points to an non existing entry in 'customer_visitor'. So when you updating the livetime statistics now. A lot of entries are missing.

Additional Information (*)

When cron cleans visitor log, report_event table also get clean from app/code/Magento/Reports/Model/ResourceModel/Event.php:180 When we run lifetime statics refresh it collects all data by joining report_event table in app/code/Magento/Reports/Model/ResourceModel/Report/Product/Viewed.php:10

  1. In collection it ignores the damaged data which has been removed in clean process.

Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.

m2-assistant[bot] commented 3 years ago

Hi @bernd-reindl. Thank you for your report. To help us process this issue please make sure that you provided the following information:

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, please, review the Magento Contributor Assistant documentation.

Please, add a comment to assign the issue: @magento I am working on this


:clock10: You can find the schedule on the Magento Community Calendar page.

:telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, please join the Community Contributions Triage session to discuss the appropriate ticket.

:movie_camera: You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel

:pencil2: Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel

m2-assistant[bot] commented 3 years ago

Hi @engcom-Delta. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

engcom-Delta commented 3 years ago

Hi @bernd-reindl , I tried to reproduce the issue as per provided steps but I could see different entry for "Most Viewed". Kindly confirm me is this expected result?( Attached the snapshot) Thanks

RefreshLife It would be helpful if you can provide detailed steps to reproduce it.

bernd-reindl commented 3 years ago

@engcom-Delta there are no data to refresh in the develop instance.

You need Data in report_viewed_product_aggregated_daily, report_viewed_product_aggregated_monthly and report_viewed_product_aggregated_yearly older than one day.

Then you need a active cron job "visitor_clean" which is running everyday at midnight. This cron job runs "\Magento\Customer\Model\Visitor::clean()" which calls "\Magento\Customer\Model\ResourceModel\Visitor::clean()". This method cleans all entries from table 'customer_visitor' older than "\Magento\Customer\Model\Visitor::getCleanTime()".

The module "Magento_Reports" has a plugin "\Magento\Reports\Model\Plugin\Log::afterClean()" which is called after "\Magento\Customer\Model\ResourceModel\Visitor::clean()" and clean all entries from table 'report_event' which does not point to an existing entrie in table 'customer_visitor'.

When running "Refresh Lifetime Statistics" now, the data in report_viewed_productaggregated was created with the data in the table 'report_event'. But there are a lot of entries missing after cleaning the log. So the data in report_viewed_productaggregated are corrupt after refreshing the statistics. Only the data newer than "\Magento\Customer\Model\Visitor::getCleanTime()" are correct.

m2-assistant[bot] commented 3 years ago

Hi @engcom-Lima. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

engcom-Lima commented 3 years ago

Hi @bernd-reindl,

It seems this functionality of ‘Refresh Lifetime Statistics’ is working as expected. If you don’t want to destroy ‘Most Viewed’ data, you can deselect that option and select others. You can go through the documentation for same on below link: https://docs.magento.com/user-guide/reports/statistics.html

Now, what I don’t understand is the context of your problem. So in order to understand the problem you are facing, can you please update if you are trying to explain anything that is not expected behaviour ? Also please clarify the context.

bernd-reindl commented 3 years ago

Ho @engcom-Lima,

The refresh works correct. But with wrong (damaged) data.

I try to explain.

Every time a user request a product detail page, a entry in table 'report_event' is created. This entry consists of the event_type_id (1 for catalog_product_view), an logged_at timestamp, an object_id (the product ID), an subject_id (Visitor ID or Customer ID), an subtype (0 = customer; 1 = visitor) and an store_id.

Also a entry in table 'report_viewed_product_index' is created when a user request the product detail page.

At midnight the cronjob "visitor_clean" (See crontab.xml of Magento_Customer) runs and clean all entries from table 'customer_visitor' which older than 'Visitor::getCleanTime()'.

This plugin calls \Magento\Reports\Model\Event::clean() which removes all entries from table 'report_event' which points to an subject_id, have the subtype 1 (Visitor) and where the subject_id isn't existing in table 'customer_visitor'.

Now all entries from table 'report_event' older than 'Visitor::getCleanTime()' are removed. So the entries in table 'report_event' are different from them in table 'report_viewed_product_index'.

Running ‘Refresh Lifetime Statistics’ aggregates the data from table 'report_event'. But there is missing a lot of data in this table because of cleaning the logs.

SELECT * FROM report_viewed_product_index WHERE added_at >= '2021-09-19 00:00:00' AND added_at < '2021-09-20 00:00:00' return 419 rows.

SELECT count(*) FROM report_event WHERE event_type_id = 1 AND logged_at >= '2021-09-19 00:00:00' AND logged_at < '2021-09-20 00:00:00' return 3 rows.

report_event report_viewed_product_index

engcom-Lima commented 3 years ago

Hi @bernd-reindl,

Thank you for the detailed explanation.

What I understood from the explanation is that there is discrepancy in the data of tables report_viewed_product_index and report_event as visitor_clean cron is clearing the logs of report_event table so when we run 'Refresh Lifetime Statistics', system is providing inaccurate Statistics data. Please update if I understood your issue correctly or add what I missed ?

I'll do further analysis accordingly.

bernd-reindl commented 3 years ago

@engcom-Lima that's correct, because after running the visitor_clean cron, the entries in report_event are missing. But 'Refresh Lifetime Statistics' uses report_event to aggregate the reports. Not report_viewed_product_index.

engcom-Lima commented 2 years ago

Hi @bernd-reindl,

I understood the issue now. In order to understand it's impact, can you please share some screenshots of the data which is coming as corrupt or the data that is missing which should have been there ?

It would be really helpful. I'll do further analysis accordingly.

Thanks

bernd-reindl commented 2 years ago

Hi @bernd-reindl,

I understood the issue now. In order to understand it's impact, can you please share some screenshots of the data which is coming as corrupt or the data that is missing which should have been there ?

It would be really helpful. I'll do further analysis accordingly.

Thanks

Hi @engcom-Lima

image report_event.csv report_viewed_product_index.csv

As you can see, there are 68122 entries in report_viewed_product_index and 1542 entries since 1st of October. The most viewed statistics (image) shows the top 5 products for each day. But only for the current day this statistics are correct.

Hope this helps.

m2-assistant[bot] commented 2 years ago

Hi @engcom-November. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:

engcom-November commented 2 years ago

We are confirming this issue as per the explanation Additional Information for dev: When cron cleans visitor log, report_event table also get clean from app/code/Magento/Reports/Model/ResourceModel/Event.php:180 When we run lifetime statics refresh it collects all data by joining report_event table in app/code/Magento/Reports/Model/ResourceModel/Report/Product/Viewed.php:10

  1. In collection it ignores the damaged data which has been removed in clean process.
github-jira-sync-bot commented 2 years ago

:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-6022 is successfully created for this GitHub issue.

m2-assistant[bot] commented 2 years ago

:white_check_mark: Confirmed by @engcom-November. Thank you for verifying the issue.
Issue Available: @engcom-November, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

github-jira-sync-bot commented 2 years ago

:x: Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.adobe.com/browse/AC-6022