loganbvh / py-tdgl

2D time-dependent Ginzburg-Landau in Python
MIT License
37 stars 13 forks source link

TypeError with Pardiso solver #74

Closed reed-foster closed 7 months ago

reed-foster commented 7 months ago

I followed the instructions for installation (installing through PyPI) and ran through the quickstart. I ran the testing suite and all of the tests passed. However, I noticed a TypeError when I tried to change the solver type to use PyPardiso (which I installed using conda install -c conda-forge pypardiso):

Here's the stack trace:

Traceback (most recent call last):
  File "[.../lib/python3.10/site-packages/tdgl/solver/solver.py", line 799](http://localhost:8888/.../lib/python3.10/site-packages/tdgl/solver/solver.py#line=798), in solve
    data_was_generated = runner.run()
  File "[.../lib/python3.10/site-packages/tdgl/solver/runner.py", line 305](http://localhost:8888/.../lib/python3.10/site-packages/tdgl/solver/runner.py#line=304), in run
    success = self._run_stage(
  File "[.../lib/python3.10/site-packages/tdgl/solver/runner.py", line 416](http://localhost:8888/.../lib/python3.10/site-packages/tdgl/solver/runner.py#line=415), in _run_stage
    function_result = self.function(
  File "[.../lib/python3.10/site-packages/tdgl/solver/solver.py", line 675](http://localhost:8888/.../lib/python3.10/site-packages/tdgl/solver/solver.py#line=674), in update
    mu, supercurrent, normal_current = self.solve_for_observables(psi, dA_dt)
  File "[.../lib/python3.10/site-packages/tdgl/solver/solver.py", line 509](http://localhost:8888/.../lib/python3.10/site-packages/tdgl/solver/solver.py#line=508), in solve_for_observables
    mu = pypardiso.spsolve(operators.mu_laplacian, rhs)
  File "[.../lib/python3.10/site-packages/pypardiso/scipy_aliases.py", line 44](http://localhost:8888/.../lib/python3.10/site-packages/pypardiso/scipy_aliases.py#line=43), in spsolve
    solver._check_A(A)
  File "[.../lib/python3.10/site-packages/pypardiso/pardiso_wrapper.py", line 227](http://localhost:8888/.../lib/python3.10/site-packages/pypardiso/pardiso_wrapper.py#line=226), in _check_A
    raise TypeError(msg)
TypeError: PyPardiso requires matrix A to be in CSR or CSC format, but matrix A is: <class 'scipy.sparse._csc.csc_array'>

It looks like when mu_laplacian is generated, it gets generated as a csc_array, but PyPardiso requires a csc_matrix.

If the following section is modified: https://github.com/loganbvh/py-tdgl/blob/ac8b2d9e07b9c681fe6c72fb04fd0dbbbd856840/tdgl/finite_volume/operators.py#L300-L301

if self.sparse_solver is SparseSolver.CUPY:
...
elif self.sparse_solver is SparseSolver.PARDISO:
+ self.mu_laplacian = sp._csc.csc_matrix(self.mu_laplacian)
  self.mu_laplacian_lu = None

Then the simulation runs fine (although it's not any faster than using the default SparseLU solver, but perhaps that's just because of the structure of the example simulation geometry).

loganbvh commented 7 months ago

Hi Reed,

Thanks for reporting this. scipy sparse arrays and scipy sparse matrices should be interchangeable in this context, so it makes sense that your fix worked. I chose to use the sparse array type in pyTDGL because that's what's recommended by scipy (see note here). This check in PyPardiso is unnecessarily restrictive. I opened a PR in PyPardiso to address this https://github.com/haasad/PyPardiso/issues/68.

Other people have also reported that the MKL pardiso solver is not any faster than SuperLU despite being multithreaded, so you're probably better off just using SuperLU. If you have access to an NVIDIA GPU, using the GPU + SuperLU is the fastest combination I have found. If you're interested my testing is here https://github.com/loganbvh/py-tdgl/issues/34#issuecomment-1732427524

reed-foster commented 7 months ago

Hi Logan,

Thanks, that makes sense. After doing some further testing with the quickstart example, it definitely seems like SuperLU is the best choice for this geometry/mesh. Interestingly enough, using my NVIDIA GPU seems to slow things down for the quickstart example (4,671 mesh sites). When I monitor the GPU with nvidia-smi, it seems like the GPU utilization is rather low (typically around 10% utilization; not much more than when loading a webpage e.g., except when everything is solved on the GPU with CUPY where it reaches >90% utilization). I guess this is because the mesh is relatively small?

Here's the mesh information:

{
  'num_sites': 4671,
  'num_elements': 8748,
  'min_edge_length': 0.037649046290826015,
  'max_edge_length': 0.251608033926911,
  'mean_edge_length': 0.14058280371468113,
  'min_area': 0.0008161429472105218,
  'max_area': 0.035097437653613554,
  'mean_area': 0.016480442066212443,
  'coherence_length': 0.5,
  'length_units': 'um',
}

no GPU + SuperLU (59s total):

profile of tdgl.TDGLSolver.update() ``` Timer unit: 1e-06 s Total time: 58.5067 s File: .../lib/python3.10/site-packages/tdgl/solver/solver.py Function: update at line 575 Line # Hits Time Per Hit % Time Line Contents ============================================================== 575 @profile 576 def update( 577 self, 578 state: Dict[str, Union[int, float]], 579 running_state: RunningState, 580 dt: float, 581 *, 582 psi: np.ndarray, 583 mu: np.ndarray, 584 supercurrent: np.ndarray, 585 normal_current: np.ndarray, 586 induced_vector_potential: np.ndarray, 587 applied_vector_potential: Optional[np.ndarray] = None, 588 epsilon: Optional[np.ndarray] = None, 589 ) -> SolverResult: 590 """This method is called at each time step to update the state of the system. 591 592 Args: 593 state: The solver state, i.e., the solve step, time, and time step 594 running_state: A container for scalar data that is saved at each time step 595 dt: The time step for the previous solve step 596 psi: The order parameter 597 mu: The scalar potential 598 supercurrent: The supercurrent density 599 normal_current: The normal current density 600 induced_vector_potential: The induced vector potential 601 applied_vector_potential: The applied vector potential. This will be ``None… 602 in the case of a time-independent vector potential. 603 epsilon: The disorder parameter ``epsilon``. This will be ``None`` 604 in the case of a time-independent ``epsilon``. 605 606 Returns: 607 A :class:`tdgl.SolverResult` instance for the solve step. 608 """ 609 55127 20221.2 0.4 0.0 xp = self.xp 610 55127 11828.3 0.2 0.0 options = self.options 611 55127 8004.7 0.1 0.0 operators = self.operators 612 613 55127 13078.6 0.2 0.0 step = state["step"] 614 55127 9297.1 0.2 0.0 time = state["time"] 615 55127 6895.7 0.1 0.0 A_induced = induced_vector_potential 616 55127 9249.0 0.2 0.0 prev_A_applied = A_applied = applied_vector_potential 617 618 # Update the scalar potential boundary conditions. 619 55127 613730.7 11.1 1.0 self.update_mu_boundary(time) 620 621 # Update the applied vector potential. 622 55127 7549.0 0.1 0.0 dA_dt = 0.0 623 55127 9065.4 0.2 0.0 current_A_applied = self.current_A_applied 624 55127 13562.4 0.2 0.0 if self.dynamic_vector_potential: 625 current_A_applied = self.update_applied_vector_potential(time) 626 dA_dt = xp.einsum( 627 "ij, ij -> i", 628 (current_A_applied - prev_A_applied) / dt, 629 self.normalized_directions, 630 ) 631 if xp.any(xp.absolute(dA_dt) > 0): 632 # Update the link exponents only if the applied vector potential 633 # has actually changed. 634 operators.set_link_exponents(current_A_applied) 635 else: 636 55127 11455.2 0.2 0.0 assert A_applied is None 637 55127 10291.9 0.2 0.0 prev_A_applied = A_applied = current_A_applied 638 55127 15855.3 0.3 0.0 self.current_A_applied = current_A_applied 639 640 # Update the value of epsilon 641 55127 8490.0 0.2 0.0 epsilon = self.epsilon 642 55127 8986.2 0.2 0.0 if self.dynamic_epsilon: 643 epsilon = self.epsilon = self.update_epsilon(time) 644 645 55127 663885.7 12.0 1.1 old_sq_psi = xp.absolute(psi) ** 2 646 55127 16587.5 0.3 0.0 screening_error = np.inf 647 55127 12445.1 0.2 0.0 A_induced_vals = [A_induced] 648 55127 9247.6 0.2 0.0 velocity = [0.0] # Velocity for Polyak's method 649 # This loop runs only once if options.include_screening is False 650 55127 37115.2 0.7 0.1 for screening_iteration in itertools.count(): 651 55127 23713.1 0.4 0.0 if screening_error < options.screening_tolerance: 652 break 653 55127 16844.0 0.3 0.0 if screening_iteration > options.max_iterations_per_step: 654 raise RuntimeError( 655 f"Screening calculation failed to converge at step {step} after" 656 f" {options.max_iterations_per_step} iterations. Relative error in" 657 f" induced vector potential: {screening_error:.2e}" 658 f" (tolerance: {options.screening_tolerance:.2e})." 659 ) 660 661 # Adjust the time step and calculate the new the order parameter 662 55127 11452.1 0.2 0.0 if screening_iteration == 0: 663 # Find a new time step only for the first screening iteration. 664 55127 9567.6 0.2 0.0 dt = self.tentative_dt 665 666 55127 11878.7 0.2 0.0 if options.include_screening: 667 # Update the link variables in the covariant Laplacian and gradient 668 # for psi based on the induced vector potential from the previous itera… 669 operators.set_link_exponents(current_A_applied + A_induced) 670 671 # Update the order parameter using an adaptive time step 672 110254 19942051.0 180.9 34.1 psi, abs_sq_psi, dt = self.adaptive_euler_step( 673 55127 9288.6 0.2 0.0 step, psi, old_sq_psi, mu, epsilon, dt 674 ) 675 # Update the scalar potential, supercurrent density, and normal current den… 676 55127 33505171.8 607.8 57.3 mu, supercurrent, normal_current = self.solve_for_observables(psi, dA_dt) 677 678 55127 28383.7 0.5 0.0 if options.include_screening: 679 # Evaluate the induced vector potential 680 A_induced, screening_error = self.get_induced_vector_potential( 681 supercurrent + normal_current, A_induced_vals, velocity 682 ) 683 else: 684 55127 12261.8 0.2 0.0 break 685 686 55127 156739.1 2.8 0.3 running_state.append("dt", dt) 687 55127 13602.7 0.2 0.0 if self.probe_points is not None: 688 # Update the voltage and phase difference 689 55127 233421.0 4.2 0.4 running_state.append("mu", mu[self.probe_points]) 690 55127 380937.3 6.9 0.7 running_state.append("theta", xp.angle(psi[self.probe_points])) 691 55127 14060.5 0.3 0.0 if options.include_screening: 692 running_state.append("screening_iterations", screening_iteration) 693 694 55127 14473.4 0.3 0.0 if options.adaptive: 695 # Compute the max abs change in |psi|^2, averaged over the adaptive window, 696 # and use it to select a new time step. 697 55127 688837.5 12.5 1.2 self.d_psi_sq_vals.append(float(xp.absolute(abs_sq_psi - old_sq_psi).max())) 698 55127 16326.0 0.3 0.0 window = options.adaptive_window 699 55127 13853.2 0.3 0.0 if step > window: 700 110210 62529.6 0.6 0.1 new_dt = options.dt_init / max( 701 55105 1116285.0 20.3 1.9 1e-10, np.mean(self.d_psi_sq_vals[-window:]) 702 ) 703 55105 586112.0 10.6 1.0 self.tentative_dt = np.clip(0.5 * (new_dt + dt), 0, self.dt_max) 704 705 55127 20816.6 0.4 0.0 results = [dt, psi, mu, supercurrent, normal_current, A_induced] 706 55127 12465.3 0.2 0.0 if self.dynamic_vector_potential: 707 results.append(current_A_applied) 708 55127 7388.1 0.1 0.0 if self.dynamic_epsilon: 709 results.append(epsilon) 710 55127 81410.1 1.5 0.1 return SolverResult(*results) 58.51 seconds - .../lib/python3.10/site-packages/tdgl/solver/solver.py:575 - update ```

GPU + SuperLU (135s total):

profile of tdgl.TDGLSolver.update() ``` Timer unit: 1e-06 s Total time: 134.917 s File: .../lib/python3.10/site-packages/tdgl/solver/solver.py Function: update at line 575 Line # Hits Time Per Hit % Time Line Contents ============================================================== 575 @profile 576 def update( 577 self, 578 state: Dict[str, Union[int, float]], 579 running_state: RunningState, 580 dt: float, 581 *, 582 psi: np.ndarray, 583 mu: np.ndarray, 584 supercurrent: np.ndarray, 585 normal_current: np.ndarray, 586 induced_vector_potential: np.ndarray, 587 applied_vector_potential: Optional[np.ndarray] = None, 588 epsilon: Optional[np.ndarray] = None, 589 ) -> SolverResult: 590 """This method is called at each time step to update the state of the system. 591 592 Args: 593 state: The solver state, i.e., the solve step, time, and time step 594 running_state: A container for scalar data that is saved at each time step 595 dt: The time step for the previous solve step 596 psi: The order parameter 597 mu: The scalar potential 598 supercurrent: The supercurrent density 599 normal_current: The normal current density 600 induced_vector_potential: The induced vector potential 601 applied_vector_potential: The applied vector potential. This will be ``None… 602 in the case of a time-independent vector potential. 603 epsilon: The disorder parameter ``epsilon``. This will be ``None`` 604 in the case of a time-independent ``epsilon``. 605 606 Returns: 607 A :class:`tdgl.SolverResult` instance for the solve step. 608 """ 609 55128 22236.5 0.4 0.0 xp = self.xp 610 55128 11268.0 0.2 0.0 options = self.options 611 55128 8907.9 0.2 0.0 operators = self.operators 612 613 55128 14424.5 0.3 0.0 step = state["step"] 614 55128 8535.6 0.2 0.0 time = state["time"] 615 55128 7572.6 0.1 0.0 A_induced = induced_vector_potential 616 55128 9544.8 0.2 0.0 prev_A_applied = A_applied = applied_vector_potential 617 618 # Update the scalar potential boundary conditions. 619 55128 669942.1 12.2 0.5 self.update_mu_boundary(time) 620 621 # Update the applied vector potential. 622 55128 8990.0 0.2 0.0 dA_dt = 0.0 623 55128 10487.0 0.2 0.0 current_A_applied = self.current_A_applied 624 55128 13090.8 0.2 0.0 if self.dynamic_vector_potential: 625 current_A_applied = self.update_applied_vector_potential(time) 626 dA_dt = xp.einsum( 627 "ij, ij -> i", 628 (current_A_applied - prev_A_applied) / dt, 629 self.normalized_directions, 630 ) 631 if xp.any(xp.absolute(dA_dt) > 0): 632 # Update the link exponents only if the applied vector potential 633 # has actually changed. 634 operators.set_link_exponents(current_A_applied) 635 else: 636 55128 11047.2 0.2 0.0 assert A_applied is None 637 55128 9658.8 0.2 0.0 prev_A_applied = A_applied = current_A_applied 638 55128 13364.3 0.2 0.0 self.current_A_applied = current_A_applied 639 640 # Update the value of epsilon 641 55128 8202.6 0.1 0.0 epsilon = self.epsilon 642 55128 9491.3 0.2 0.0 if self.dynamic_epsilon: 643 epsilon = self.epsilon = self.update_epsilon(time) 644 645 55128 3167676.6 57.5 2.3 old_sq_psi = xp.absolute(psi) ** 2 646 55128 25386.4 0.5 0.0 screening_error = np.inf 647 55128 14648.8 0.3 0.0 A_induced_vals = [A_induced] 648 55128 10102.3 0.2 0.0 velocity = [0.0] # Velocity for Polyak's method 649 # This loop runs only once if options.include_screening is False 650 55128 54275.0 1.0 0.0 for screening_iteration in itertools.count(): 651 55128 27600.3 0.5 0.0 if screening_error < options.screening_tolerance: 652 break 653 55128 17876.4 0.3 0.0 if screening_iteration > options.max_iterations_per_step: 654 raise RuntimeError( 655 f"Screening calculation failed to converge at step {step} after" 656 f" {options.max_iterations_per_step} iterations. Relative error in" 657 f" induced vector potential: {screening_error:.2e}" 658 f" (tolerance: {options.screening_tolerance:.2e})." 659 ) 660 661 # Adjust the time step and calculate the new the order parameter 662 55128 10289.1 0.2 0.0 if screening_iteration == 0: 663 # Find a new time step only for the first screening iteration. 664 55128 10031.3 0.2 0.0 dt = self.tentative_dt 665 666 55128 13196.2 0.2 0.0 if options.include_screening: 667 # Update the link variables in the covariant Laplacian and gradient 668 # for psi based on the induced vector potential from the previous itera… 669 operators.set_link_exponents(current_A_applied + A_induced) 670 671 # Update the order parameter using an adaptive time step 672 110256 45846835.3 415.8 34.0 psi, abs_sq_psi, dt = self.adaptive_euler_step( 673 55128 9634.5 0.2 0.0 step, psi, old_sq_psi, mu, epsilon, dt 674 ) 675 # Update the scalar potential, supercurrent density, and normal current den… 676 55128 63170829.8 1145.9 46.8 mu, supercurrent, normal_current = self.solve_for_observables(psi, dA_dt) 677 678 55128 32276.9 0.6 0.0 if options.include_screening: 679 # Evaluate the induced vector potential 680 A_induced, screening_error = self.get_induced_vector_potential( 681 supercurrent + normal_current, A_induced_vals, velocity 682 ) 683 else: 684 55128 16313.5 0.3 0.0 break 685 686 55128 1357769.3 24.6 1.0 running_state.append("dt", dt) 687 55128 21503.3 0.4 0.0 if self.probe_points is not None: 688 # Update the voltage and phase difference 689 55128 6453157.5 117.1 4.8 running_state.append("mu", mu[self.probe_points]) 690 55128 6737665.4 122.2 5.0 running_state.append("theta", xp.angle(psi[self.probe_points])) 691 55128 26417.6 0.5 0.0 if options.include_screening: 692 running_state.append("screening_iterations", screening_iteration) 693 694 55128 15339.2 0.3 0.0 if options.adaptive: 695 # Compute the max abs change in |psi|^2, averaged over the adaptive window, 696 # and use it to select a new time step. 697 55128 4624941.6 83.9 3.4 self.d_psi_sq_vals.append(float(xp.absolute(abs_sq_psi - old_sq_psi).max())) 698 55128 24540.4 0.4 0.0 window = options.adaptive_window 699 55128 14218.6 0.3 0.0 if step > window: 700 110212 73242.8 0.7 0.1 new_dt = options.dt_init / max( 701 55106 1466267.4 26.6 1.1 1e-10, np.mean(self.d_psi_sq_vals[-window:]) 702 ) 703 55106 705099.5 12.8 0.5 self.tentative_dt = np.clip(0.5 * (new_dt + dt), 0, self.dt_max) 704 705 55128 22271.0 0.4 0.0 results = [dt, psi, mu, supercurrent, normal_current, A_induced] 706 55128 14218.6 0.3 0.0 if self.dynamic_vector_potential: 707 results.append(current_A_applied) 708 55128 7964.6 0.1 0.0 if self.dynamic_epsilon: 709 results.append(epsilon) 710 55128 88776.7 1.6 0.1 return SolverResult(*results) 134.92 seconds - .../lib/python3.10/site-packages/tdgl/solver/solver.py:575 - update ```

GPU + CUPY (821s total):

profile of tdgl.TDGLSolver.update() ``` Timer unit: 1e-06 s Total time: 821.029 s File: .../lib/python3.10/site-packages/tdgl/solver/solver.py Function: update at line 575 Line # Hits Time Per Hit % Time Line Contents ============================================================== 575 @profile 576 def update( 577 self, 578 state: Dict[str, Union[int, float]], 579 running_state: RunningState, 580 dt: float, 581 *, 582 psi: np.ndarray, 583 mu: np.ndarray, 584 supercurrent: np.ndarray, 585 normal_current: np.ndarray, 586 induced_vector_potential: np.ndarray, 587 applied_vector_potential: Optional[np.ndarray] = None, 588 epsilon: Optional[np.ndarray] = None, 589 ) -> SolverResult: 590 """This method is called at each time step to update the state of the system. 591 592 Args: 593 state: The solver state, i.e., the solve step, time, and time step 594 running_state: A container for scalar data that is saved at each time step 595 dt: The time step for the previous solve step 596 psi: The order parameter 597 mu: The scalar potential 598 supercurrent: The supercurrent density 599 normal_current: The normal current density 600 induced_vector_potential: The induced vector potential 601 applied_vector_potential: The applied vector potential. This will be ``None… 602 in the case of a time-independent vector potential. 603 epsilon: The disorder parameter ``epsilon``. This will be ``None`` 604 in the case of a time-independent ``epsilon``. 605 606 Returns: 607 A :class:`tdgl.SolverResult` instance for the solve step. 608 """ 609 55122 26419.7 0.5 0.0 xp = self.xp 610 55122 13216.5 0.2 0.0 options = self.options 611 55122 9002.3 0.2 0.0 operators = self.operators 612 613 55122 17421.0 0.3 0.0 step = state["step"] 614 55122 10395.8 0.2 0.0 time = state["time"] 615 55122 7089.1 0.1 0.0 A_induced = induced_vector_potential 616 55122 9617.0 0.2 0.0 prev_A_applied = A_applied = applied_vector_potential 617 618 # Update the scalar potential boundary conditions. 619 55122 713736.2 12.9 0.1 self.update_mu_boundary(time) 620 621 # Update the applied vector potential. 622 55122 9818.4 0.2 0.0 dA_dt = 0.0 623 55122 10041.1 0.2 0.0 current_A_applied = self.current_A_applied 624 55122 14000.4 0.3 0.0 if self.dynamic_vector_potential: 625 current_A_applied = self.update_applied_vector_potential(time) 626 dA_dt = xp.einsum( 627 "ij, ij -> i", 628 (current_A_applied - prev_A_applied) / dt, 629 self.normalized_directions, 630 ) 631 if xp.any(xp.absolute(dA_dt) > 0): 632 # Update the link exponents only if the applied vector potential 633 # has actually changed. 634 operators.set_link_exponents(current_A_applied) 635 else: 636 55122 11599.2 0.2 0.0 assert A_applied is None 637 55122 10001.9 0.2 0.0 prev_A_applied = A_applied = current_A_applied 638 55122 17402.2 0.3 0.0 self.current_A_applied = current_A_applied 639 640 # Update the value of epsilon 641 55122 8783.8 0.2 0.0 epsilon = self.epsilon 642 55122 8975.2 0.2 0.0 if self.dynamic_epsilon: 643 epsilon = self.epsilon = self.update_epsilon(time) 644 645 55122 3417567.6 62.0 0.4 old_sq_psi = xp.absolute(psi) ** 2 646 55122 29208.6 0.5 0.0 screening_error = np.inf 647 55122 15749.5 0.3 0.0 A_induced_vals = [A_induced] 648 55122 10174.5 0.2 0.0 velocity = [0.0] # Velocity for Polyak's method 649 # This loop runs only once if options.include_screening is False 650 55122 52930.6 1.0 0.0 for screening_iteration in itertools.count(): 651 55122 30079.0 0.5 0.0 if screening_error < options.screening_tolerance: 652 break 653 55122 17925.7 0.3 0.0 if screening_iteration > options.max_iterations_per_step: 654 raise RuntimeError( 655 f"Screening calculation failed to converge at step {step} after" 656 f" {options.max_iterations_per_step} iterations. Relative error in" 657 f" induced vector potential: {screening_error:.2e}" 658 f" (tolerance: {options.screening_tolerance:.2e})." 659 ) 660 661 # Adjust the time step and calculate the new the order parameter 662 55122 11488.7 0.2 0.0 if screening_iteration == 0: 663 # Find a new time step only for the first screening iteration. 664 55122 12228.1 0.2 0.0 dt = self.tentative_dt 665 666 55122 16095.6 0.3 0.0 if options.include_screening: 667 # Update the link variables in the covariant Laplacian and gradient 668 # for psi based on the induced vector potential from the previous itera… 669 operators.set_link_exponents(current_A_applied + A_induced) 670 671 # Update the order parameter using an adaptive time step 672 110244 48408993.7 439.1 5.9 psi, abs_sq_psi, dt = self.adaptive_euler_step( 673 55122 9704.3 0.2 0.0 step, psi, old_sq_psi, mu, epsilon, dt 674 ) 675 # Update the scalar potential, supercurrent density, and normal current den… 676 55122 671478523.3 12181.7 81.8 mu, supercurrent, normal_current = self.solve_for_observables(psi, dA_dt) 677 678 55122 38041.1 0.7 0.0 if options.include_screening: 679 # Evaluate the induced vector potential 680 A_induced, screening_error = self.get_induced_vector_potential( 681 supercurrent + normal_current, A_induced_vals, velocity 682 ) 683 else: 684 55122 20026.9 0.4 0.0 break 685 686 55122 1420228.9 25.8 0.2 running_state.append("dt", dt) 687 55122 26515.5 0.5 0.0 if self.probe_points is not None: 688 # Update the voltage and phase difference 689 55122 7138637.8 129.5 0.9 running_state.append("mu", mu[self.probe_points]) 690 55122 6817231.1 123.7 0.8 running_state.append("theta", xp.angle(psi[self.probe_points])) 691 55122 29351.4 0.5 0.0 if options.include_screening: 692 running_state.append("screening_iterations", screening_iteration) 693 694 55122 15928.9 0.3 0.0 if options.adaptive: 695 # Compute the max abs change in |psi|^2, averaged over the adaptive window, 696 # and use it to select a new time step. 697 55122 78341367.2 1421.2 9.5 self.d_psi_sq_vals.append(float(xp.absolute(abs_sq_psi - old_sq_psi).max())) 698 55122 28763.9 0.5 0.0 window = options.adaptive_window 699 55122 13169.6 0.2 0.0 if step > window: 700 110200 97621.3 0.9 0.0 new_dt = options.dt_init / max( 701 55100 1693785.9 30.7 0.2 1e-10, np.mean(self.d_psi_sq_vals[-window:]) 702 ) 703 55100 776416.7 14.1 0.1 self.tentative_dt = np.clip(0.5 * (new_dt + dt), 0, self.dt_max) 704 705 55122 23183.5 0.4 0.0 results = [dt, psi, mu, supercurrent, normal_current, A_induced] 706 55122 14126.3 0.3 0.0 if self.dynamic_vector_potential: 707 results.append(current_A_applied) 708 55122 8090.9 0.1 0.0 if self.dynamic_epsilon: 709 results.append(epsilon) 710 55122 118367.7 2.1 0.0 return SolverResult(*results) 821.03 seconds - .../lib/python3.10/site-packages/tdgl/solver/solver.py:575 - update ```

GPU + Paradiso (172s total):

profile of tdgl.TDGLSolver.update() ``` Timer unit: 1e-06 s Total time: 172.268 s File: .../lib/python3.10/site-packages/tdgl/solver/solver.py Function: update at line 575 Line # Hits Time Per Hit % Time Line Contents ============================================================== 575 @profile 576 def update( 577 self, 578 state: Dict[str, Union[int, float]], 579 running_state: RunningState, 580 dt: float, 581 *, 582 psi: np.ndarray, 583 mu: np.ndarray, 584 supercurrent: np.ndarray, 585 normal_current: np.ndarray, 586 induced_vector_potential: np.ndarray, 587 applied_vector_potential: Optional[np.ndarray] = None, 588 epsilon: Optional[np.ndarray] = None, 589 ) -> SolverResult: 590 """This method is called at each time step to update the state of the system. 591 592 Args: 593 state: The solver state, i.e., the solve step, time, and time step 594 running_state: A container for scalar data that is saved at each time step 595 dt: The time step for the previous solve step 596 psi: The order parameter 597 mu: The scalar potential 598 supercurrent: The supercurrent density 599 normal_current: The normal current density 600 induced_vector_potential: The induced vector potential 601 applied_vector_potential: The applied vector potential. This will be ``None… 602 in the case of a time-independent vector potential. 603 epsilon: The disorder parameter ``epsilon``. This will be ``None`` 604 in the case of a time-independent ``epsilon``. 605 606 Returns: 607 A :class:`tdgl.SolverResult` instance for the solve step. 608 """ 609 55122 26112.6 0.5 0.0 xp = self.xp 610 55122 11626.2 0.2 0.0 options = self.options 611 55122 7521.8 0.1 0.0 operators = self.operators 612 613 55122 14015.2 0.3 0.0 step = state["step"] 614 55122 8171.5 0.1 0.0 time = state["time"] 615 55122 6714.4 0.1 0.0 A_induced = induced_vector_potential 616 55122 9478.3 0.2 0.0 prev_A_applied = A_applied = applied_vector_potential 617 618 # Update the scalar potential boundary conditions. 619 55122 731965.1 13.3 0.4 self.update_mu_boundary(time) 620 621 # Update the applied vector potential. 622 55122 9676.1 0.2 0.0 dA_dt = 0.0 623 55122 9376.6 0.2 0.0 current_A_applied = self.current_A_applied 624 55122 13666.8 0.2 0.0 if self.dynamic_vector_potential: 625 current_A_applied = self.update_applied_vector_potential(time) 626 dA_dt = xp.einsum( 627 "ij, ij -> i", 628 (current_A_applied - prev_A_applied) / dt, 629 self.normalized_directions, 630 ) 631 if xp.any(xp.absolute(dA_dt) > 0): 632 # Update the link exponents only if the applied vector potential 633 # has actually changed. 634 operators.set_link_exponents(current_A_applied) 635 else: 636 55122 11245.9 0.2 0.0 assert A_applied is None 637 55122 10603.8 0.2 0.0 prev_A_applied = A_applied = current_A_applied 638 55122 14342.8 0.3 0.0 self.current_A_applied = current_A_applied 639 640 # Update the value of epsilon 641 55122 9011.3 0.2 0.0 epsilon = self.epsilon 642 55122 8711.4 0.2 0.0 if self.dynamic_epsilon: 643 epsilon = self.epsilon = self.update_epsilon(time) 644 645 55122 3369066.2 61.1 2.0 old_sq_psi = xp.absolute(psi) ** 2 646 55122 28197.6 0.5 0.0 screening_error = np.inf 647 55122 14901.7 0.3 0.0 A_induced_vals = [A_induced] 648 55122 9564.3 0.2 0.0 velocity = [0.0] # Velocity for Polyak's method 649 # This loop runs only once if options.include_screening is False 650 55122 54961.5 1.0 0.0 for screening_iteration in itertools.count(): 651 55122 27576.1 0.5 0.0 if screening_error < options.screening_tolerance: 652 break 653 55122 16793.4 0.3 0.0 if screening_iteration > options.max_iterations_per_step: 654 raise RuntimeError( 655 f"Screening calculation failed to converge at step {step} after" 656 f" {options.max_iterations_per_step} iterations. Relative error in" 657 f" induced vector potential: {screening_error:.2e}" 658 f" (tolerance: {options.screening_tolerance:.2e})." 659 ) 660 661 # Adjust the time step and calculate the new the order parameter 662 55122 10893.8 0.2 0.0 if screening_iteration == 0: 663 # Find a new time step only for the first screening iteration. 664 55122 10318.3 0.2 0.0 dt = self.tentative_dt 665 666 55122 11469.6 0.2 0.0 if options.include_screening: 667 # Update the link variables in the covariant Laplacian and gradient 668 # for psi based on the induced vector potential from the previous itera… 669 operators.set_link_exponents(current_A_applied + A_induced) 670 671 # Update the order parameter using an adaptive time step 672 110244 49797626.5 451.7 28.9 psi, abs_sq_psi, dt = self.adaptive_euler_step( 673 55122 9828.4 0.2 0.0 step, psi, old_sq_psi, mu, epsilon, dt 674 ) 675 # Update the scalar potential, supercurrent density, and normal current den… 676 55122 94558604.9 1715.4 54.9 mu, supercurrent, normal_current = self.solve_for_observables(psi, dA_dt) 677 678 55122 32139.3 0.6 0.0 if options.include_screening: 679 # Evaluate the induced vector potential 680 A_induced, screening_error = self.get_induced_vector_potential( 681 supercurrent + normal_current, A_induced_vals, velocity 682 ) 683 else: 684 55122 18330.8 0.3 0.0 break 685 686 55122 1527490.2 27.7 0.9 running_state.append("dt", dt) 687 55122 23577.8 0.4 0.0 if self.probe_points is not None: 688 # Update the voltage and phase difference 689 55122 6928452.3 125.7 4.0 running_state.append("mu", mu[self.probe_points]) 690 55122 7241052.5 131.4 4.2 running_state.append("theta", xp.angle(psi[self.probe_points])) 691 55122 27977.5 0.5 0.0 if options.include_screening: 692 running_state.append("screening_iterations", screening_iteration) 693 694 55122 15461.8 0.3 0.0 if options.adaptive: 695 # Compute the max abs change in |psi|^2, averaged over the adaptive window, 696 # and use it to select a new time step. 697 55122 5102494.5 92.6 3.0 self.d_psi_sq_vals.append(float(xp.absolute(abs_sq_psi - old_sq_psi).max())) 698 55122 25946.7 0.5 0.0 window = options.adaptive_window 699 55122 14647.7 0.3 0.0 if step > window: 700 110200 74801.0 0.7 0.0 new_dt = options.dt_init / max( 701 55100 1519483.5 27.6 0.9 1e-10, np.mean(self.d_psi_sq_vals[-window:]) 702 ) 703 55100 745196.2 13.5 0.4 self.tentative_dt = np.clip(0.5 * (new_dt + dt), 0, self.dt_max) 704 705 55122 24435.9 0.4 0.0 results = [dt, psi, mu, supercurrent, normal_current, A_induced] 706 55122 15061.4 0.3 0.0 if self.dynamic_vector_potential: 707 results.append(current_A_applied) 708 55122 8943.6 0.2 0.0 if self.dynamic_epsilon: 709 results.append(epsilon) 710 55122 100507.8 1.8 0.1 return SolverResult(*results) 172.27 seconds - .../lib/python3.10/site-packages/tdgl/solver/solver.py:575 - update ```

No GPU + Paradiso (91s total):

profile of tdgl.TDGLSolver.update() ``` Timer unit: 1e-06 s Total time: 91.0203 s File: .../lib/python3.10/site-packages/tdgl/solver/solver.py Function: update at line 575 Line # Hits Time Per Hit % Time Line Contents ============================================================== 575 @profile 576 def update( 577 self, 578 state: Dict[str, Union[int, float]], 579 running_state: RunningState, 580 dt: float, 581 *, 582 psi: np.ndarray, 583 mu: np.ndarray, 584 supercurrent: np.ndarray, 585 normal_current: np.ndarray, 586 induced_vector_potential: np.ndarray, 587 applied_vector_potential: Optional[np.ndarray] = None, 588 epsilon: Optional[np.ndarray] = None, 589 ) -> SolverResult: 590 """This method is called at each time step to update the state of the system. 591 592 Args: 593 state: The solver state, i.e., the solve step, time, and time step 594 running_state: A container for scalar data that is saved at each time step 595 dt: The time step for the previous solve step 596 psi: The order parameter 597 mu: The scalar potential 598 supercurrent: The supercurrent density 599 normal_current: The normal current density 600 induced_vector_potential: The induced vector potential 601 applied_vector_potential: The applied vector potential. This will be ``None… 602 in the case of a time-independent vector potential. 603 epsilon: The disorder parameter ``epsilon``. This will be ``None`` 604 in the case of a time-independent ``epsilon``. 605 606 Returns: 607 A :class:`tdgl.SolverResult` instance for the solve step. 608 """ 609 55134 23359.2 0.4 0.0 xp = self.xp 610 55134 11995.6 0.2 0.0 options = self.options 611 55134 8811.1 0.2 0.0 operators = self.operators 612 613 55134 13873.9 0.3 0.0 step = state["step"] 614 55134 10144.2 0.2 0.0 time = state["time"] 615 55134 7342.2 0.1 0.0 A_induced = induced_vector_potential 616 55134 9625.6 0.2 0.0 prev_A_applied = A_applied = applied_vector_potential 617 618 # Update the scalar potential boundary conditions. 619 55134 677989.6 12.3 0.7 self.update_mu_boundary(time) 620 621 # Update the applied vector potential. 622 55134 8847.4 0.2 0.0 dA_dt = 0.0 623 55134 9565.6 0.2 0.0 current_A_applied = self.current_A_applied 624 55134 14684.8 0.3 0.0 if self.dynamic_vector_potential: 625 current_A_applied = self.update_applied_vector_potential(time) 626 dA_dt = xp.einsum( 627 "ij, ij -> i", 628 (current_A_applied - prev_A_applied) / dt, 629 self.normalized_directions, 630 ) 631 if xp.any(xp.absolute(dA_dt) > 0): 632 # Update the link exponents only if the applied vector potential 633 # has actually changed. 634 operators.set_link_exponents(current_A_applied) 635 else: 636 55134 11787.5 0.2 0.0 assert A_applied is None 637 55134 11317.1 0.2 0.0 prev_A_applied = A_applied = current_A_applied 638 55134 14915.3 0.3 0.0 self.current_A_applied = current_A_applied 639 640 # Update the value of epsilon 641 55134 9579.5 0.2 0.0 epsilon = self.epsilon 642 55134 9542.5 0.2 0.0 if self.dynamic_epsilon: 643 epsilon = self.epsilon = self.update_epsilon(time) 644 645 55134 751430.0 13.6 0.8 old_sq_psi = xp.absolute(psi) ** 2 646 55134 21758.2 0.4 0.0 screening_error = np.inf 647 55134 13026.5 0.2 0.0 A_induced_vals = [A_induced] 648 55134 10070.8 0.2 0.0 velocity = [0.0] # Velocity for Polyak's method 649 # This loop runs only once if options.include_screening is False 650 55134 42506.5 0.8 0.0 for screening_iteration in itertools.count(): 651 55134 25290.2 0.5 0.0 if screening_error < options.screening_tolerance: 652 break 653 55134 16262.6 0.3 0.0 if screening_iteration > options.max_iterations_per_step: 654 raise RuntimeError( 655 f"Screening calculation failed to converge at step {step} after" 656 f" {options.max_iterations_per_step} iterations. Relative error in" 657 f" induced vector potential: {screening_error:.2e}" 658 f" (tolerance: {options.screening_tolerance:.2e})." 659 ) 660 661 # Adjust the time step and calculate the new the order parameter 662 55134 12015.3 0.2 0.0 if screening_iteration == 0: 663 # Find a new time step only for the first screening iteration. 664 55134 11025.9 0.2 0.0 dt = self.tentative_dt 665 666 55134 13234.2 0.2 0.0 if options.include_screening: 667 # Update the link variables in the covariant Laplacian and gradient 668 # for psi based on the induced vector potential from the previous itera… 669 operators.set_link_exponents(current_A_applied + A_induced) 670 671 # Update the order parameter using an adaptive time step 672 110268 22074107.8 200.2 24.3 psi, abs_sq_psi, dt = self.adaptive_euler_step( 673 55134 9848.1 0.2 0.0 step, psi, old_sq_psi, mu, epsilon, dt 674 ) 675 # Update the scalar potential, supercurrent density, and normal current den… 676 55134 63151550.9 1145.4 69.4 mu, supercurrent, normal_current = self.solve_for_observables(psi, dA_dt) 677 678 55134 39603.0 0.7 0.0 if options.include_screening: 679 # Evaluate the induced vector potential 680 A_induced, screening_error = self.get_induced_vector_potential( 681 supercurrent + normal_current, A_induced_vals, velocity 682 ) 683 else: 684 55134 16828.8 0.3 0.0 break 685 686 55134 199753.2 3.6 0.2 running_state.append("dt", dt) 687 55134 15480.6 0.3 0.0 if self.probe_points is not None: 688 # Update the voltage and phase difference 689 55134 277690.6 5.0 0.3 running_state.append("mu", mu[self.probe_points]) 690 55134 429112.3 7.8 0.5 running_state.append("theta", xp.angle(psi[self.probe_points])) 691 55134 17114.8 0.3 0.0 if options.include_screening: 692 running_state.append("screening_iterations", screening_iteration) 693 694 55134 13027.9 0.2 0.0 if options.adaptive: 695 # Compute the max abs change in |psi|^2, averaged over the adaptive window, 696 # and use it to select a new time step. 697 55134 795271.5 14.4 0.9 self.d_psi_sq_vals.append(float(xp.absolute(abs_sq_psi - old_sq_psi).max())) 698 55134 28192.6 0.5 0.0 window = options.adaptive_window 699 55134 15248.3 0.3 0.0 if step > window: 700 110224 71145.8 0.6 0.1 new_dt = options.dt_init / max( 701 55112 1243687.2 22.6 1.4 1e-10, np.mean(self.d_psi_sq_vals[-window:]) 702 ) 703 55112 701345.3 12.7 0.8 self.tentative_dt = np.clip(0.5 * (new_dt + dt), 0, self.dt_max) 704 705 55134 22576.8 0.4 0.0 results = [dt, psi, mu, supercurrent, normal_current, A_induced] 706 55134 12880.6 0.2 0.0 if self.dynamic_vector_potential: 707 results.append(current_A_applied) 708 55134 9679.6 0.2 0.0 if self.dynamic_epsilon: 709 results.append(epsilon) 710 55134 106191.4 1.9 0.1 return SolverResult(*results) 91.02 seconds - .../lib/python3.10/site-packages/tdgl/solver/solver.py:575 - update ```
loganbvh commented 7 months ago

For such a small mesh, you're almost definitely limited by overheads related to CPU/GPU synchronization and data transfer. In the all CPU case each call to TDGLSolver.update() only takes 1 ms, so even if CPU/GPU sync and data transfer only takes a fraction of a ms, using the GPU ends up not being worth it. My other tests (https://github.com/loganbvh/py-tdgl/issues/34#issuecomment-1732427524) showed ~30% speedup with GPU + SuperLU for meshes of size 27,000 and 78,000. The exact speedup (or slowdown) probably also depends on the specific CPU, GPU, and memory hardware

reed-foster commented 7 months ago

Ah that makes sense. Thanks!

loganbvh commented 7 months ago

Closing via https://github.com/loganbvh/py-tdgl/pull/75